Skip to main content

Analyzing postprandial metabolomics data using multiway models: a simulation study

Abstract

Background

Analysis of time-resolved postprandial metabolomics data can improve the understanding of metabolic mechanisms, potentially revealing biomarkers for early diagnosis of metabolic diseases and advancing precision nutrition and medicine. Postprandial metabolomics measurements at several time points from multiple subjects can be arranged as a subjects by metabolites by time points array. Traditional analysis methods are limited in terms of revealing subject groups, related metabolites, and temporal patterns simultaneously from such three-way data.

Results

We introduce an unsupervised multiway analysis approach based on the CANDECOMP/PARAFAC (CP) model for improved analysis of postprandial metabolomics data guided by a simulation study. Because of the lack of ground truth in real data, we generate simulated data using a comprehensive human metabolic model. This allows us to assess the performance of CP models in terms of revealing subject groups and underlying metabolic processes. We study three analysis approaches: analysis of fasting-state data using principal component analysis, T0-corrected data (i.e., data corrected by subtracting fasting-state data) using a CP model and full-dynamic (i.e., full postprandial) data using CP. Through extensive simulations, we demonstrate that CP models capture meaningful and stable patterns from simulated meal challenge data, revealing underlying mechanisms and differences between diseased versus healthy groups.

Conclusions

Our experiments show that it is crucial to analyze both fasting-state and T0-corrected data for understanding metabolic differences among subject groups. Depending on the nature of the subject group structure, the best group separation may be achieved by CP models of T0-corrected or full-dynamic data. This study introduces an improved analysis approach for postprandial metabolomics data while also shedding light on the debate about correcting baseline values in longitudinal data analysis.

Peer Review reports

Background

Postprandial metabolomics data (also referred to as meal challenge test data) includes the metabolic transition from the fasting to the fed state. Analysis of such data can reveal the underlying biological processes and improve the understanding of metabolic mechanisms [1, 2]. Examples include the study of a high-fat meal response for over one thousand subjects, which revealed subclass patterns in lipoprotein metabolism [3]. Analysis of postprandial metabolomics data also holds the promise to reveal new biomarkers especially for cardiometabolic diseases [4, 5]. In addition to detecting biomarkers and advancing prediagnosis, studying the postprandial state has proven useful in designing personalized nutrition [6]. Because of the high variability between individuals in terms of their post-meal glucose and triglyceride levels, personalized dietary interventions can be considered to lower individual post-meal glucose, thus may help decrease the risk of diseases such as prediabetes [6, 7].

There are various methods available for analyzing meal challenge test data. These are mainly supervised approaches, i.e., assuming that the group information (e.g., labels such as healthy and diseased) is known and incorporated into the analysis. In the aforementioned and other related work, univariate analyses are the most used tools for analyzing postprandial metabolism, e.g., the repeated measures analysis of variance (ANOVA) method is used to test group differences [3, 5]; a linear mixed model (LMM), which adds the random effect in the analysis, is used to explore the time and group interaction effect [8]. Univariate methods are simple and powerful in terms of studying group differences; however, such analyses are performed per metabolite, thus unable to reveal the relation between metabolites. Since metabolites are interlinked via pathways and hold dependent or independent relations with other metabolites, multivariate methods are promising in terms of capturing the underlying patterns in the data [4, 9]. Supervised multivariate methods used to analyze metabolomics data from challenge tests often combine ANOVA or LMM with principal component analysis (PCA). Such methods include ANOVA-Simultaneous Component Analysis (ASCA)[10], ANOVA-PCA [11], ANOVA-Partial Least Squares (PLS) [12], and their extensions [13, 14]. While these methods are suitable for time-resolved data, they rely on a priori known group information.

When the goal is exploratory analysis to reveal unknown stratification of subjects such as subgroups among healthy or diseased subjects, the workhorse data analysis approaches are unsupervised methods based on PCA and clustering [9]. However, these approaches cannot exploit the three-way structure of postprandial metabolomics data. They rely on either measurements at one time point or summaries such as clusters of time profiles obtained using data averaged across subjects [2, 15]. An effective way to preserve the multiway nature of the data and extract the underlying patterns is to use tensor factorizations [16,17,18], which are extensions of matrix factorizations such as PCA to multiway arrays (also referred to as higher-order data sets).

In this paper, we arrange time-resolved metabolomics data as a three-way array with modes: subjects, metabolites, and time, and use the CANDECOMP/PARAFAC (CP) [19, 20] tensor model to reveal the underlying patterns. The CP model summarizes the three-way array as a sum of a small number of factor triplets as in Fig. 1. Among various tensor factorization methods, we use the CP model because of its uniqueness properties [16, 21], which facilitate interpretability of the extracted factors. In addition, the CP model is less sensitive to noise due to the concise structure (parsimony) of the model [22]. The CP model has previously been mentioned as a promising analysis tool for meal challenge test data [4, 9], but no analysis results have been presented for the three-way data arranged as in Fig. 1. Recently, the effectiveness of the CP model in terms of analyzing other types of longitudinal data has been studied, e.g., using gut microbiome data from infants studying microbial changes and how those relate to the birth mode [23], urine metabolomics data from newborns exploring the use of the CP model for compositional data [24] and simulated dynamic metabolomics data relying on small-scale metabolic pathway models [25]. Unlike previous studies, we provide a comprehensive study of the CP model for analyzing time-resolved metabolomics data, in particular, focusing on the human metabolism in response to a meal challenge test and studying metabolic changes from a baseline state to dynamic states.

Fig. 1
figure 1

An R-component CP model of a three-way array with modes: subjects, metabolites, and time. Vectors \({{\mathbf {{a}}}}_{r}\), \({{\mathbf {{b}}}}_{r}\) and \({{\mathbf {{c}}}}_{r}\) correspond to the patterns in the subjects, metabolites, and time modes

In the literature, analysis of full-dynamic data (i.e., full postprandial data without baseline correction) as well as T0-corrected data (i.e., the data corrected by subtracting the fasting state, similar to the method of analysis of changes [26]) have been considered [1, 2, 5, 27, 28]. Note that the full-dynamic data has information from the fasting as well as the T0-corrected (pure-dynamic) state. Understanding metabolic differences between these two states can help understand the metabolic mechanisms for postprandial metabolomics data. In this paper, we explore whether the CP model should be applied to T0-corrected data or full-dynamic data for postprandial metabolomics data analysis. Furthermore, we investigate the added value of analysis of the postprandial state compared to the fasting state.

In order to assess the performance of analysis approaches in terms of revealing the underlying patterns in the data, we generate simulated postprandial metabolomics data with known ground-truth information. To mimic the evolution of metabolite concentrations during a meal challenge, we used a human metabolic model [29] that describes metabolic pathways in eight organs in mechanistic detail by enzyme kinetic equations. Variation of disease-relevant parameters generated different subject groups (i.e., control versus diseased). Through extensive computational experiments, we assess the performance of analysis of fasting-state data using PCA, T0-corrected data using a CP model and full-dynamic data also using a CP model. We demonstrate that (i) CP models of postprandial metabolomics data reveal meaningful underlying patterns, (ii) for understanding metabolic differences among subject groups, it is crucial to analyze both fasting-state and T0-corrected data, (iii) depending on the nature of the subject group structure, the best performance in terms of revealing subject groups may be achieved by CP models of T0-corrected or full-dynamic data, and (iv) patterns extracted by CP models are reliable (i.e., consistently observed in a number of different settings such as when subsets of subjects are left-out, in the presence of a higher level of within-group variation, and with an unbalanced number of control and diseased subjects).

Materials and methods

Simulated postprandial metabolomics data

Human whole-body metabolic model

To generate simulated postprandial metabolomics data, we consider a human whole-body metabolic model proposed by Kurata [29]. The model is defined by a set of ordinary differential equations, with metabolites as variables and kinetic constants as parameters. Consisting of 202 metabolites, 217 reaction rates and 1,140 kinetic parameters, this multi-scale and multi-organ model is the largest, comprehensive, and highly predictive kinetic model of the whole-body metabolism [29]. It describes each reaction with a reversible Michaelis-Menten type rate equation and includes the action of insulin and glucagon, key hormones that govern the response to a meal (Fig. 2). The metabolic model considers both intracellular and extracellular metabolites. The meal in this simulated model only considers glucose and fat, where 87 grams (g) carbohydrate and 33 g fat is given after a 10-hour fasting. Although the metabolic model provides two parameters to regulate glucose and fat intake, we observe no apparent improvement in comparison with the real data for different amounts of glucose and fat intake.

Fig. 2
figure 2

Metabolic network considered in each organ in the human whole-body model except for the pancreas; see [29] for more details

Generation of the simulated data

To test whether our proposed unsupervised approach is able to reveal different subject groups, we generate simulated blood metabolite concentrations using two types of variation: (i) between-group variation and (ii) within-group variation. The between-group variation is introduced by changing a specific parameter, i.e., Km_Ins_B_M or Km_inssyn_Glc_B, in the differential equations, as demonstrated in [29]. These two parameters are used to regulate the insulin-stimulated glucose uptake in skeletal muscle or the glucose-stimulated insulin secretion by the pancreas, respectively. Each type of between-group variation mimics one disease, which is as follows [29]:

  • Insulin resistance in skeletal muscle: It is used for the study of type 2 diabetes mellitus and simulated by multiplying the default Km_Ins_B_M with 1.5,

  • Beta-cell dysfunction: Its extreme case is type 1 diabetes mellitus and it is simulated by multiplying the default Km_inssyn_Glc_B with 1.1.

For the within-group variation, i.e., individual variation, we randomly perturb the kinetic constants in the liver (see Additional file 1 for a list of perturbed parameters). The perturbation level is denoted by \(\alpha\). For example, \(\alpha =0.2\) indicates that related kinetic constants are set to be random parameters ranging from \((100-20)\%\) to \((100+20)\%\) of their default values.

Based on these definitions of between and within-group variations, we generate data for each subject as follows:

  1. 1.

    Get the 10-hour fasting concentrations for each individual (one individual has one set of random kinetic constants) by running the human whole-body metabolic model with the default initial values of model variables in [29], with an exception of the initial concentrations for particular metabolites presented in the next paragraph;

  2. 2.

    Start the meal challenge, i.e., run the human whole-body model using each individual’s 10-hour fasting state, and take the concentrations of metabolites at time points \(t=[0, 0.25, 0.5, 1, 1.5, 2, 2.5, 4]\) hours (with \(T_0=0\)h, \(T_1=0.25\)h, \(\cdots\), \(T_7=4\)h).

We choose the above time points to match the time samples in the real data, which we use to assess how realistic the simulations are (see section “Simulated postprandial metabolomics data are realistic”). In step one, we set initial values of Insulin (Ins) and blood metabolites Glucose (Glc), Pyruvate (Pyr), Lactate (Lac), Alanine (Ala), \(\beta\)-hydroxybutyrate (Bhb), Triglyceride and total Cholesterol to the median of fasting concentrations in the real data. Although the metabolic model involves 202 metabolites (including hormones) in different organs, we only consider hormone Ins and blood metabolites Glc, Pyr, Lac, Ala and Bhb since they are also measured in the real data. In addition, they are involved in the same metabolic network (see Fig. 2), which makes the comparison of real and simulated data easier. When simulating data for a diseased subject, we use the parameter values given for insulin resistance and beta-cell dysfunction; otherwise, default values for parameters Km_Ins_B_M and Km_inssyn_Glc_B are used to simulate data for control subjects.

The simulated data from multiple subjects is arranged as a three-way array (\(\#\) subjects \(\times\) 6 metabolites \(\times\) \(\#\) time points) as in Fig. 1.

Simulated postprandial metabolomics data are realistic

We compare the simulated data with the real data to demonstrate how realistic the simulations are. We use the real data corresponding to Nuclear Magnetic Resonance (NMR) spectroscopy measurements of plasma samples collected during a meal challenge test from the COPSAC\(_{2000}\) cohort [30]. The real data is not publicly available but may be shared by COPSAC through a collaboration agreement. The study was conducted in accordance with the Declaration of Helsinki and was approved by The Copenhagen Ethics Committee (KF 01-289/96 and H-16039498) and The Danish Data Protection Agency (2015-41-3696). Written consent was obtained from the participants. The cohort considered in this work consists of 299 healthy 18-year-old subjects (144 males and 155 females). The blood samples were collected at eight time points following an overnight (at least 9-hour) fasting during the fasting and postprandial states, i.e., at \(t=[0, 0.25, 0.5, 1, 1.5, 2, 2.5, 4]\)h. Over two hundred features were measured. We select the six features mentioned in section “Generation of the simulated data” since these are the ones available in both real and simulated data.

Time profiles of metabolites in simulated and real data match well for most metabolites except for Pyr and Lac, where we noticed deviations (See Additional file 2: Fig. S2.1). The simulated Pyr differed most from the standard reference range [0.04, 0.1]mmol/L. To get a better correspondence for the initial concentrations, we performed a sensitivity analysis on the model and tuned the parameters related to Pyr and Lac (see Additional file 2: Fig. S2.2). After parameter tuning, all initial (fasting) concentrations, except Lac, corresponded well between the real and simulated data (see Fig. 3). For Lac, we note that also in [29] the measured concentration of Lac is higher than the simulated value (1 versus 0.5 mM). In our subjects, the Lac concentration is even higher, suggesting a difference between the cohorts. We do not necessarily expect a perfect correspondence between the real and simulated data throughout the time course since the simulated meal (taken from [29]) differs from the real meal (see Additional file 2: Table S2.1). In addition, the large individual variability in the real data puts the discrepancy between the real and simulated data in perspective, as demonstrated in Fig. 3. The simulated model is realistic for our study in the sense that we observe that responses of key metabolites upon the meal challenge are in the physiological range, which is what we are interested in. Changing the metabolic model to simulate a meal-intake similar to the real data is outside the scope of this study.

Fig. 3
figure 3

Median concentrations across all subjects following a meal challenge test. The standard meal in the real data includes 60 g palm olein, 75 g glucose and 20 g protein [31] and is given after overnight fasting, while the meal in simulations contains 87 g glucose and 33 g fat after 10-hour fasting

CANDECOMP/PARAFAC (CP) model

The CP model, which stands for Canonical Decomposition (CANDECOMP) [20] and Parallel Factor Analysis (PARAFAC) [19], stems from the polyadic form of a tensor [32]. Similar to the matrix Singular Value Decomposition (SVD), the main idea of the CP model is to represent a tensor as the sum of a minimum number of rank-one tensors (Fig. 1). The CP model has been successfully used in many disciplines, including data mining [17, 33], neuroscience [34, 35], chemometrics [22], and recently also in longitudinal omics data analysis [23] in terms of revealing the underlying patterns from complex data sets.

For a third-order tensor \({{\mathscr {X}}}\in \mathbb {R}^{I\times J\times K}\), an R-component CP model represents the tensor as follows:

$$\begin{aligned} {{\mathscr {X}}} \approx \sum _{r=1}^{R} {{\mathbf {{a}}}}_{r}\circ {{\mathbf {{b}}}}_{r}\circ {{\mathbf {{c}}}}_{r}, \end{aligned}$$
(1)

where \(\circ\) denotes the vector outer product, \({{\mathbf {{A}}}} = [{{\mathbf {{a}}}}_{1} \hspace{0.05in}... \hspace{0.05in} {{\mathbf {{a}}}}_{R}] \in {\mathbb R}^{I \times R}\), \({{\mathbf {{B}}}}= [{{\mathbf {{b}}}}_{1} \hspace{0.05in}... \hspace{0.05in} {{\mathbf {{b}}}}_{R}]\in {\mathbb R}^{J \times R}\), and \({{\mathbf {{C}}}}= [{{\mathbf {{c}}}}_{1} \hspace{0.05in}... \hspace{0.05in} {{\mathbf {{c}}}}_{R}]\in {\mathbb R}^{K \times R}\) are the factor matrices corresponding to each mode. Columns of the factor matrices, e.g., \({{\mathbf {{a}}}}_{r}, {{\mathbf {{b}}}}_{r}\), and \({{\mathbf {{c}}}}_{r}\) corresponding to the rth component, reveal the underlying patterns in the data. When assessing how well the CP model fits the data, we use the explained variance (also referred to as model fit):

$$\begin{aligned} \text {Fit} (\%)=(1-\frac{\left\| {{\mathscr {X}}}-\hat{\mathscr {X}}\right\| ^2}{\left\| {{\mathscr {X}}}\right\| ^2})\times 100, \end{aligned}$$

where \(\hat{\mathscr {X}} =\sum _{r=1}^{R} {{\mathbf {{a}}}}_{r}\circ {{\mathbf {{b}}}}_{r}\circ {{\mathbf {{c}}}}_{r}\) is the data approximation based on the CP model, and \(\left\| .\right\|\) denotes the Frobenius norm. A fit value close to \(100\%\) indicates that data \({{\mathscr {X}}}\) is explained well by the model; otherwise, there is an unexplained part left in the residuals.

The CP model has been widely used in applications as a result of its uniqueness property, i.e., factors are unique up to permutation and scaling ambiguities under mild conditions, without imposing additional constraints such as orthogonality [16, 21]. This uniqueness property facilitates interpretation. In the CP model, each rank-one component (\({{\mathbf {{a}}}}_{r}, {{\mathbf {{b}}}}_{r}, {{\mathbf {{c}}}}_{r}\)), i.e., rth column of factor matrices in all modes, can be interpreted, for instance, in terms of identifying groups of subjects (subjects with similar coefficients in \({{\mathbf {{a}}}}_{r}\)) positively or negatively related to specific sets of metabolites (metabolites with large coefficients in \({{\mathbf {{b}}}}_{r}\)) following certain temporal profiles (given by \({{\mathbf {{c}}}}_{r}\)). We normalize vectors (\({{\mathbf {{a}}}}_{r}, {{\mathbf {{b}}}}_{r}, {{\mathbf {{c}}}}_{r}\)) to unit norm when we interpret the factors. Due to the scaling ambiguity in the CP model, the norms can be absorbed arbitrarily by the vectors as long as the product of the norms of the vectors stays the same. In the presence of missing entries, which is common in applications, it is still possible to analyze such incomplete data using a CP model [36, 37].

The selection of the number of components (i.e., R) when fitting a CP model is a challenging task [16, 38], and an active research topic [39]. Among existing approaches, we make use of the model fit and the core consistency diagnostic [40]. In addition, we choose the number of components based on the interpretability of the model and replicability of the factors [39, 41] in subsets of the data (see Additional file 3: for details about the selection of number of components).

The performance of the CP model does not depend on the data size, i.e., the number of dimensions in each mode: I, J, K. If the data follows a CP structure, i.e., a sum of rank-one tensors, the CP model can reveal the underlying patterns even with data from a few subjects. For instance, fluorescence spectroscopy measurements of mixtures can be arranged as a third-order tensor with modes: mixtures, emission wavelengths, and excitation wavelengths. When such data is analyzed using a CP model, each rank-one component can capture one of the chemicals in the mixtures, and has done so successfully with only few, e.g., five, mixtures [22]. However, there are still challenges that will be encountered in the case of small data set size in any mode. First, the number of factors that can be uniquely extracted from the data will be limited [16, 21] as uniqueness conditions depend on the number of dimensions in each mode. Furthermore, using replicability to determine the number of components will not be possible with a small number of samples (that holds for a resampling strategy for any model). Finally, in the time mode, it is important to have enough time points to get temporal profiles.

Analysis approach

We analyze each postprandial metabolomics data set using three methods: (i) PCA of the fasting-state data, (ii) CP model of the T0-corrected data, and (iii) CP model of the full-dynamic data. We compare the fasting-state analysis with T0-corrected analysis to understand metabolic differences between these two states. We compare these three approaches to obtain a better picture in terms of subject group differences and underlying metabolic mechanisms. When assessing subject group differences, we apply the unpaired (two-sample) t-test to each subject component and the null hypothesis is that the two groups come from independent random samples from normal distributions with equal means, without the assumption of equal variances. The main focus of the paper is to investigate metabolic differences associated with subject group differences during the fasting and pure-dynamic states, rather than striving for the best group separation.

Experimental set-up and implementation details

In our experiments with simulated data, we consider the analysis of eight data sets generated using different types of between-group variation, different levels of individual variations, and balanced or unbalanced groups (see Table 1 for a complete list of data sets). In the balanced case, there are 50 controls and 50 diseased subjects; in the unbalanced case, there are 70 control and 30 diseased subjects.

Table 1 Eight simulated data sets generated using different settings.

Before the CP analyses, the three-way data is preprocessed by centering across the subjects mode (to remove the common intercept from all subjects) and then scaling within the metabolites mode (to adjust for scale differences in different metabolites), i.e., dividing by the root mean squared value of each slice in the metabolites mode (see [42] for details about preprocessing multiway arrays). Before PCA, the fasting-state data is preprocessed similarly, i.e., centered and scaled.

For fitting CP models, we use cp-opt [43] and cp-wopt [37] from the Tensor Toolbox version 3.1 [44] with the Limited Memory BFGS optimization algorithm. cp-wopt uses weighted optimization [37] for fitting CP to incomplete data. For PCA, we use the svd function from MATLAB. Multiple random initializations are used to avoid local minima when fitting CP models. For the unpaired t-test, we use the ttest2 function from MATLAB.

When comparing CP models for different data sets, we use the factor match score (FMS) to quantify the similarity of factors in specific modes, e.g., the metabolites and time modes. FMS is defined as follows:

$$\begin{aligned} \text {FMS}=\frac{1}{R}\sum _{r=1}^{R}\frac{|{{{\mathbf {{b}}}}_{r}}^{\textsf{T}}{\hat{{\mathbf {b}}}_{r}}|}{\left\| {{\mathbf {{b}}}}_{r}\right\| \left\| \smash {\hat{{\mathbf {b}}}_{r}}\right\| }\frac{|{{{\mathbf {{c}}}}_{r}}^{\textsf{T}}{\hat{{\mathbf {c}}}_{r}}|}{\left\| {{\mathbf {{c}}}}_{r}\right\| \left\| \smash {\hat{{\mathbf {c}}}_{r}}\right\| } \end{aligned}$$

where \({{\mathbf {{b}}}}_{r}\), \({{\mathbf {{c}}}}_{r}\) and \({\hat{{\mathbf {b}}}}_{r}\), \({\hat{{\mathbf {c}}}}_{r}\) denote the rth column of factor matrices in the metabolites and time modes extracted by CP models from two different data sets (after finding the best permutation of the columns). FMS values are between 0 and 1, and an FMS value close to 1 indicates high similarity between the CP factors extracted from two different data sets.

All experiments are performed in MATLAB 2020a. Simulated data sets in Table 1 and example scripts are available in the GitHub repository for the paper: https://github.com/Lu-source/project-of-challenge-test-data.

Results

CP models extract similar factors from the simulated versus real meal challenge data

Since subjects in the real data are considered healthy, we expect that the CP model extracts similar patterns in the metabolites and time modes from real data (299 subjects) versus the simulated data with only the 50 control subjects (\(\alpha =0.2\)). Patterns extracted from the subjects mode are omitted since the subjects mode corresponds to different individuals in simulated versus real data. Fig. 4 shows that the 3-component CP model of each data set reveals similar patterns from the metabolites and time modes, to a certain extent.

The first component (\({{\mathbf {{b}}}}_{1}\) and \({{\mathbf {{c}}}}_{1}\)) in Fig. 4 mainly models metabolites Pyr, Lac and Ala (i.e., metabolites with large coefficients) which share similar dynamic patterns, i.e., being accumulated and then consumed. It makes sense that these three metabolites stay close since they are tied to each other with reactions Pyr\(\leftrightarrow\)Lac and Pyr\(\leftrightarrow\)Ala as shown in the pathway in Fig. 2. However, we observe different peak heights and different time points for the highest peak in the time mode in real versus simulated data. The different peak heights are possibly because meal compositions are different in real versus simulated data. The different time points at the highest peak may be due to the fact that different individuals reach the highest peak at different time in the real data, and the first component of real data (with \({{\mathbf {{c}}}}_{1}\)) captures the temporal profiles of a group of subjects with the highest peak appearing at around 0.5h in metabolites Lac, Ala and Pyr (see Additional file 4 for details).

The second component (see \({{\mathbf {{b}}}}_{2}\) and \({{\mathbf {{c}}}}_{2}\) in Fig. 4) mainly captures Ins and Glc, which are accumulated and then consumed, after the meal challenge but at a different speed than Lac and Ala. However, we observe that the coefficient for Pyr is significantly different in real versus simulated data. For the simulated data, the coefficient of Pyr is almost zero since the time point at the highest peak for all individuals in the simulated data is the same for each metabolite, and Pyr reaches to the highest peak at different time point compared with Ins and Glc. On the other hand, in real data, there are several subjects with dynamic patterns of Pyr reaching the highest peak at the same time point compared with Ins and Glc (see Additional file 4 for more details). This results in a positive coefficient for Pyr in \({{\mathbf {{b}}}}_{2}\) in real data.

The third component (\({{\mathbf {{b}}}}_{3}\) and \({{\mathbf {{c}}}}_{3}\) in Fig. 4) mainly models Bhb, which behaves differently from other metabolites after a meal challenge, i.e., being consumed first and then getting accumulated. This behaviour has been observed in a consistent way in both real and simulated data.

Fig. 4
figure 4

Factors in the metabolites (top plots) and time (bottom plots) modes extracted from the real versus simulated data using a 3-component CP model. The vectors \({{\mathbf {{b}}}}_{1}\), \({{\mathbf {{b}}}}_{2}\) and \({{\mathbf {{b}}}}_{3}\) are the components in the metabolites mode, and vectors \({{\mathbf {{c}}}}_{1}\), \({{\mathbf {{c}}}}_{2}\) and \({{\mathbf {{c}}}}_{3}\) are the temporal patterns extracted by the CP model. Model fits for the real and simulated data are 50.3% and 71.6%, respectively

Insulin-resistant versus control group with low within-group variation

In this section, we show that fasting and T0-corrected analysis together reveal the metabolic differences between the fasting and dynamic states. Furthermore, we demonstrate that T0-corrected analysis performs better than full-dynamic analysis in terms of revealing subject group differences.

Fasting-state analysis

PCA of the fasting-state data captures the subject group difference. The scatter plot of PC3 (the unpaired t-test using PC3 gives a p-value of \(2 \times 10^{-9}\)) and PC5 (p-value of \(2\times 10^{-7}\)) in Fig. 5 shows the best group separation. Note that PC3 and PC5 together explain only \(21\%\) of the data, indicating that the individual variation may dominate the total variation in the data. Figure 5 demonstrates that all metabolites except Ins and Glc are related to the subject group separation at the fasting state. Among them, Pyr contributes the most and is negatively related to the insulin-resistant group (i.e., the insulin-resistant group has mainly positive (negative) score values, and Pyr has a negative (positive) coefficient on PC3 (PC5)). This negative association indicates that the insulin-resistant group has lower concentration of Pyr at the fasting state. This results from a lower insulin-stimulated glucose uptake into the skeletal muscle in the insulin-resistant group, leading to less conversion of glucose into pyruvate, which may be supported by the observations in the literature [45, 46].

Fig. 5
figure 5

PCA scatter plot with the best group separation for the fasting-state data with insulin-resistant and control groups. The individual variation is introduced by setting the random perturbation level as \(\alpha = 0.2\)

T0-corrected analysis

We choose the 4-component CP model for the T0-corrected data (see Additional file 3 for the selection of number of components). This model reveals the subject group difference, as shown in Fig. 6 (subjects mode). Among all components, the fourth component gives the best subject group separation (see the comparison of \({{\mathbf {{a}}}}_{4}\) with \({{\mathbf {{a}}}}_{1}\), \({{\mathbf {{a}}}}_{2}\) and \({{\mathbf {{a}}}}_{3}\)). In the metabolites mode (see \({{\mathbf {{b}}}}_{4}\)), metabolite Pyr has the largest coefficient, followed by Glc, while others have almost zero coefficients. In the time mode, \({{\mathbf {{c}}}}_{4}\) captures the dynamic behavior of mainly Pyr, i.e., it increases first and then slightly decreases. The increase of Pyr may be due to the glycolysis after the meal intake and its decrease afterwards may result from its conversion to other metabolites, e.g., Bhb, Lac and Ala, as shown in the pathway plot (Fig. 2). \({{\mathbf {{a}}}}_{4}\), \({{\mathbf {{b}}}}_{4}\) and \({{\mathbf {{c}}}}_{4}\) together indicate that the concentration of Pyr increases more in the insulin-resistant group than in the control group; this suggests that the T0-corrected Pyr is positively related to the insulin-resistant group. Compared with \({{\mathbf {{a}}}}_{4}\), \({{\mathbf {{a}}}}_{1}\) and \({{\mathbf {{a}}}}_{2}\) reveal a weaker subject group separation due to the dynamic behavior of Lac & Ala (see \({{\mathbf {{b}}}}_{1}\) and \({{\mathbf {{c}}}}_{1}\)) and Ins & Glc (see \({{\mathbf {{b}}}}_{2}\) and \({{\mathbf {{c}}}}_{2}\)), respectively. Without capturing any group separation (see \({{\mathbf {{a}}}}_{3}\)), the third component reveals that metabolite Bhb (it has the largest coefficient on \({{\mathbf {{b}}}}_{3}\) while others have almost zero coefficients) responds differently compared to other metabolites after the meal intake (see \({{\mathbf {{c}}}}_{3}\) versus \({{\mathbf {{c}}}}_{1}\), \({{\mathbf {{c}}}}_{2}\) and \({{\mathbf {{c}}}}_{4}\)), i.e., it decreases first and then slowly increases, as also shown in an earlier study [47].

Fig. 6
figure 6

Factors of the 4-component CP model extracted from the T0-corrected data with insulin-resistant and control groups. The individual variation is set by using the random perturbation level \(\alpha =0.2\). The vectors \({{\mathbf {{a}}}}_{r}\), \({{\mathbf {{b}}}}_{r}\) and \({{\mathbf {{c}}}}_{r}\), \(r=1, \ldots , 4\), are the components in the subjects, metabolites and time modes extracted by the CP model.

Full-dynamic analysis

Figure 7 demonstrates the 4-component CP model of the full-dynamic data. This model captures both the fasting-state information (\({{\mathbf {{c}}}}_{4}\) reveals a constant temporal profile) and the T0-corrected information (i.e., the first and second components). The first component, i.e., \({{\mathbf {{a}}}}_{1}\), \({{\mathbf {{b}}}}_{1}\) and \({{\mathbf {{c}}}}_{1}\), and the second component, i.e., \({{\mathbf {{a}}}}_{2}\), \({{\mathbf {{b}}}}_{2}\) and \({{\mathbf {{c}}}}_{2}\), reveal the positive relation of Lac, Ala & Pyr and Ins & Glc with the insulin-resistant group and their dynamic response after the meal intake, respectively. The fourth component mainly captures the information from the fasting state, as evidenced by a nearly constant time profile in the time mode \({{\mathbf {{c}}}}_{4}\). At the fasting state, insulin-resistant subjects have higher glucose concentrations [45, 46], which may result in lower concentrations of pyruvate since the conversion of glucose to pyruvate is less efficient due to insulin resistance. Such a negative association between pyruvate and insulin-resistant subjects is captured by the fourth component, where we see negative coefficients for insulin-resistant subjects on \({{\mathbf {{a}}}}_{4}\) and a positive coefficient for pyruvate on \({{\mathbf {{b}}}}_{4}\).

Fig. 7
figure 7

Factors of the 4-component CP model extracted from the full-dynamic data with insulin-resistant and control groups. The individual variation is set by using the random perturbation level \(\alpha =0.2\). The vectors \({{\mathbf {{a}}}}_{r}\), \({{\mathbf {{b}}}}_{r}\) and \({{\mathbf {{c}}}}_{r}\), \(r=1, \ldots , 4\), are the components in the subjects, metabolites and time modes extracted by the CP model

Comparison of three analysis approaches

Through three analysis approaches, we observe four underlying mechanisms related to (i) Pyr, (ii) Ins & Glc, (iii) Lac, Ala & Pyr, and (iv) Bhb.

Pyr is identified by all methods as an essential metabolite related to the subject group separation. However, fasting-state versus T0-corrected analysis reveals a different relation between Pyr and the insulin-resistant group. This, in turn, results in T0-corrected analysis performing better than full-dynamic analysis in terms of revealing group differences (see the comparison of the subjects mode in Fig. 6 and 7). The best separation in T0-corrected analysis is given by the fourth component, where Pyr has the dominant coefficient in the metabolites mode. The left panel of Fig. 8 shows that although the insulin-resistant group has a lower concentration of Pyr at the fasting state, Pyr accumulates much faster in the insulin-resistant group during the postprandial state (especially from 0.5h to 2 h). In this sense, subtracting the fasting-state data from each time slice of the full-dynamic data leads to a more evident group difference between control versus insulin-resistant subjects, as demonstrated in the right panel of Fig. 8.

The other mechanisms, (ii) Ins & Glc, (iii) Lac, Ala & Pyr, and (iv) Bhb, are captured through similar patterns in both T0-corrected and full-dynamic analysis.

Fig. 8
figure 8

Median time profiles of Pyr for the raw full-dynamic data (Left panel) and T0-corrected data (Right panel) with insulin-resistant versus control groups. The random perturbation level for generating individuals is \(\alpha =0.2\)

Beta-cell dysfunction versus control group with low within-group variation

In this section, we demonstrate that fasting and T0-corrected analysis together reveal the metabolic differences between the fasting and dynamic states. In addition, our analysis results indicate that full-dynamic analysis performs better than T0-corrected analysis in terms of separating subject groups.

Fasting-state analysis

The fasting-state analysis captures the subject group difference. The scatter plot of PC3 (with p-value of \(8\times 10^{-7}\) from the unpaired t-test) and PC5 (with p-value of \(2\times 10^{-8}\)) gives the best separation between the groups (Fig. 9). Figure 9 also shows that metabolites Glc and Pyr contribute the most since they have large absolute coefficients along the discrimination direction (line \(y=-x\)). In addition, it is shown that metabolite Glc (Pyr) is positively (negatively) related to the beta-cell dysfunction group. This is consistent with Additional file 2: Fig. S2.3, which shows that the beta-cell dysfunction group has a higher concentration of Glc and a lower concentration of Pyr at the fasting state compared with the control group. The higher plasma Glc in the beta-cell dysfunction group results from inadequate glucose sensing to stimulate insulin secretion. Due to this inefficient secretion, glycolysis becomes less efficient, and less Pyr is produced, leading to a lower level of plasma Pyr in the beta-cell dysfunction group.

Fig. 9
figure 9

PCA scatter plot with the best group separation for the data set with beta-cell dysfunction and control groups. The individual variation is introduced by setting the random perturbation level to \(\alpha =0.2\)

T0-corrected analysis

We choose the 4-component CP model for the T0-corrected data. This model captures the subject group separation, and the best separation is achieved by \({{\mathbf {{a}}}}_{2}\) (see Fig. 10). In the metabolites mode (\({{\mathbf {{b}}}}_{2}\)), Ins has the largest coefficient followed by Glc, while other metabolites almost have no contribution. This indicates that the dynamic changes of Ins and Glc contribute to the subject group separation, and Ins is the most significant one. Ins is shown to be negatively associated with the beta-cell dysfunction group. It makes sense that Glc stays close to Ins and shares a similar dynamic pattern as Ins (see Additional file 2: Fig. S2.4) since Ins and Glc regulate each other. In the time mode, \({{\mathbf {{c}}}}_{2}\) captures the dynamic pattern shared by Ins and Glc, i.e., an increase after the meal intake and then a decrease due to the glycolysis. \({{\mathbf {{a}}}}_{1}\) (with p-value of \(8\times 10^{-5}\)) and \({{\mathbf {{a}}}}_{4}\) (with p-value of \(2\times 10^{-4}\)) capture weak group separation in the subjects mode. In the metabolites mode, \({{\mathbf {{b}}}}_{1}\) captures the fact that Pyr, Lac and Ala respond to the meal challenge similarly, and this makes sense since they are close to each other, as shown in the pathway in Fig. 2.

Fig. 10
figure 10

Factors of the 4-component CP model of T0-corrected data with beta-cell dysfunction and control groups. The individual variation is given by setting the random perturbation level to \(\alpha =0.2\). The vectors \({{\mathbf {{a}}}}_{r}\), \({{\mathbf {{b}}}}_{r}\) and \({{\mathbf {{c}}}}_{r}\), \(r=1, \ldots , 4\), are the components in the subjects, metabolites and time modes extracted by the CP model

Full-dynamic analysis

We choose the 4-component CP model to analyze the full-dynamic data, with factor plots shown in Fig. 11. \({{\mathbf {{a}}}}_{4}\) reveals the subject group differences best. In the metabolites mode (\({{\mathbf {{b}}}}_{4}\)), metabolites Glc and Pyr have the largest coefficients in terms of absolute values. This component also reveals the positive (negative) relation of metabolite Glc (Pyr) with the beta-cell dysfunction group (i.e., the beta-cell dysfunction group has negative coefficients in the subjects mode while metabolite Glc (Pyr) has a positive (negative) coefficient in the metabolites mode). In the time mode, we observe a non-constant dynamic profile (\({{\mathbf {{c}}}}_{4}\)) with a large coefficient at \(t=0\) comparable to the coefficients of other time points. This indicates that the fourth component captures a mixture of information from the fasting and T0-corrected states, and the fasting-state signal is relatively strong. The model also captures the pattern (the second component) in T0-corrected analysis (see \({{\mathbf {{a}}}}_{2}\), \({{\mathbf {{b}}}}_{2}\) and \({{\mathbf {{c}}}}_{2}\) in Fig. 11), which shows that the beta-cell dysfunction group is negatively related with Ins.

Fig. 11
figure 11

Factors of the 4-component CP model for full-dynamic data with beta-cell dysfunction and control groups. The individual variation is given by setting the random perturbation level to \(\alpha =0.2\). The vectors \({{\mathbf {{a}}}}_{r}\), \({{\mathbf {{b}}}}_{r}\) and \({{\mathbf {{c}}}}_{r}\), \(r=1, \ldots , 4\), are the components in the subjects, metabolites and time modes extracted by the CP model

Comparison of three analysis approaches

In this case, we observe four underlying mechanisms related to (i) Glc & Pyr, (ii) Ins & Glc, (iii) Lac, Ala & Pyr, and (iv) Bhb.

While both fasting-state analysis and T0-corrected analysis can separate subject groups, the mechanisms responsible for group separation are different, with Glc & Pyr playing a role at the fasting-state analysis while Ins & Glc playing the main role in the T0-corrected analysis. We also observe that full-dynamic analysis outperforms the other two approaches in terms of capturing the subject groups (see the comparison of Figs. 10 and 11). The best separation in the full-dynamic analysis is provided by the fourth component, which captures a mixture of the information from the fasting and T0-corrected states. The Glc & Pyr signal at the fasting state gets stronger through the dynamic information. This is consistent with the more evident group difference observed at the postprandial state from the time profiles of Glc (see Additional file 2: Fig. S2.3).

The other mechanisms (iii) Lac, Ala & Pyr, and (iv) Bhb are captured similarly using both T0-corrected and full-dynamic data analysis.

The proposed analysis approach reveals metabolic aberrations

The crucial question in an unsupervised analysis is whether such an analysis can find groups among subjects pointing to metabolic differences or even metabolic deficiencies. The two studied metabolic deficiencies clearly show different patterns for the metabolites Ins, Glc and Pyr in the different analyses. These different patterns can be understood using the underlying metabolic network, thereby validating the power of the data analysis approach in terms of its ability to discriminate between different types of metabolic differences. More specifically, insulin-resistant subjects are expected to have higher fasting Glc levels, lower fasting Pyr levels, and higher pure-dynamic Ins levels. On the other hand, beta-cell dysfunction subjects are expected to have higher fasting Glc levels, lower fasting Pyr levels, and lower pure-dynamic Ins levels ([29] and references therein). The primary metabolic differences between these two metabolic deficiencies are related to Ins levels, with insulin-resistant group having a reduced ability to respond to Ins, resulting in higher Glc levels, which in turn stimulate further insulin secretion by the pancreas; conversely, those with beta-cell dysfunction have impaired insulin production, resulting in lower Ins levels. Figure 5 together with Fig. 6, and Fig. 9 together with Fig. 10 indicate that our analyses can capture such metabolic differences between these two different types of metabolic deficiencies. Although we considered here only two metabolic deficiencies, we expect this property to hold also for other types of metabolic differences, especially when - in real data - there are more than six metabolites.

CP models reveal stable patterns

Within-group variation can be enormous in real data making data analysis more challenging. For example, large within-group variations in healthy subjects might mask the changes of metabolites in response to diseases when exploring biomarker candidates [48]. Our numerical experiments demonstrate that CP models reveal stable patterns even when the level of individual variation is higher. Figure 12 shows the CP factors extracted using T0-corrected analysis from data sets generated with insulin-resistant and control groups using two different levels of individual variation, i.e., \(\alpha =0.2\) and \(\alpha =0.4\). We observe very similar factors in the metabolites and time modes even with different levels of individual variation. In addition, we consider data sets with an unbalanced number of control and diseased subjects. Table S2.2 and Table S2.3 in Additional file 2 show that CP factors are similar for data sets using balanced versus unbalanced samples, and using different levels of individual variations, with the smallest FMS value of 0.90.

Fig. 12
figure 12

Comparison of the factors in the metabolites and time modes extracted from the T0-corrected data of two data sets with different levels of individual variation. Both data sets have 50 insulin-resistant subjects and 50 control subjects. Individual variations are introduced by setting the random perturbation level to \(\alpha =0.2\) in one data set versus \(\alpha =0.4\) in the other data set. The vectors \({{\mathbf {{a}}}}_{r}\), \({{\mathbf {{b}}}}_{r}\) and \({{\mathbf {{c}}}}_{r}\), \(r=1, \ldots , 4\), correspond the components in the subjects, metabolites and time modes extracted by the CP models

The behaviours of CP models are also tested on data sets with random Gaussian noise mimicking the measurement error. The relative standard deviation (RSD) of the measurements encountered in this study can range from \(4\%\) to \(15\%\), which aligns closely with the range mentioned in [49]. Our experimental findings indicate that the CP models are robust to noise when RSD of a replicate measurement is moderate. For example, when the RSD is \(10\%\) or \(6\%\) for the data set with insulin-resistant versus control groups or the data set with beta-cell dysfunction versus control groups, respectively, the deconvoluted CP patterns are very similar compared to their noise-free versions with the FMS values (noisy versus noiseless patterns) greater than 0.92. With a further increase in RSD, the added noise may have an impact on the number of components in the CP models and the extracted patterns. More detailed results about comparisons of the patterns extracted by the CP models from noisy versus noiseless data can be found in Additional file 5.

Discussion

In this paper, we have proposed an unsupervised approach based on the CP model to improve the analysis of the postprandial metabolomics data. The main focus is to demonstrate how the proposed method can effectively capture the differences in metabolic responses among different subject groups during a meal challenge test (in particular, a priori unknown stratifications rather than the setting where there are a priori labels as in, e.g., clinical interventions). Using the simulated data with known ground truth, we have demonstrated several benefits of the proposed approach: One benefit is the extraction of the underlying patterns in all modes simultaneously, which reveals subjects groups, related metabolites as well as corresponding temporal patterns from time-resolved postprandial data. In addition, we have shown that CP models can extract reliable patterns under various settings, e.g., even for data with considerable within-group variation. Another benefit is an overall picture of metabolic differences at the fasting versus T0-corrected state, which can be obtained by using PCA for the fasting-state data and the CP model for the T0-corrected data analysis. Understanding such metabolic differences helps to explore the added value in the postprandial state due to metabolites’ dynamic responses. This, in turn, provides insightful guidance on why the full-dynamic analysis may capture a stronger or weaker group difference compared to the separate analyses of the fasting-state data and T0-corrected data in longitudinal analysis. The simulation studies indicate that such metabolic differences depend on the nature of the between-group variation, i.e., the data itself, which may be related to metabolic deficiency or aberration due to genetic variations in different groups of subjects. Therefore, to gain a more comprehensive understanding of the underlying metabolic mechanisms, particularly the interplay between the fasting and T0-corrected signals, we recommend using these three analyses together. Also, note that while our experiments rely on data sets consisting of a control and a diseased group, the proposed analysis approach holds the promise to reveal subject group differences due to different diseases, e.g., insulin-resistant versus beta-cell dysfunction versus control group, as well as subgroups of diseases, e.g., insulin-resistant versus less insulin-resistant versus control group.

We expect the proposed approach to be useful also for other applications of time-resolved data analysis, for example, other types of challenge test data such as exercise challenge tests [50]. A common feature of such data is that they are time-resolved and hold both the baseline and pure-dynamic information. The essential idea is to analyze the baseline data together with the baseline-corrected data to understand their differences. If group structures are detected, understanding such differences will facilitate better utilization of the time-resolved information.

A side product of our study is an approach to generate meaningful simulated postprandial metabolomics data. We use a comprehensive metabolic model to create such data. While the metabolic model was previously proposed, we tuned the model based on a comparison with our real data set including hundreds of subjects. Such simulated models may also be beneficial in terms of the optimal design of meal challenge tests.

Nevertheless, there are also several limitations in this study. Using the metabolic model, we have only generated data where the concentration of each metabolite reaches the highest peak at the same time points for all subjects. However, in reality, time points at the highest peak vary from one subject to another. The main difficulty in generating such data is tuning the parameters while keeping the concentrations in reasonable ranges. Since the mathematical metabolic model involves 1140 parameters and 202 metabolites in eight different organs, parameter tuning is not trivial. Other limitations come from assumptions of the data analysis method. The CP model assumes that temporal patterns (i.e., \({{\mathbf {{c}}}}_{r}\)) are the same across subjects as we have in the simulated data; therefore, it may not be able to capture the individual differences such as shifted temporal profiles. Furthermore, another assumption in the CP model is that the factor matrix in the metabolites modes is the same across all time slices. However, those factor matrices may change over time. To account for such individual-specific temporal profiles or time-evolving factors in the metabolites mode, alternative multiway methods such as the PARAFAC2 model [51] may prove useful as in the analysis of neuroimaging signals [52,53,54] and in chemometrics [55]. We plan to study the performance of PARAFAC2 model in terms of analyzing dynamic metabolomics data as well as joint analysis of dynamic metabolomics data together with other omics measurements [56].

Availability of code, data and materials

All code and simulated data are available in a GitHub repository at https://github.com/Lu-source/project-of-challenge-test-data. NMR measurements of plasma samples collected during the challenge test from the COPSAC2000 cohort are available through a collaboration agreement from COPSAC. For data access requests, please contact Morten A. Rasmussen (morten.arendt@dbac.dk).

References

  1. Harte AL, Varma MC, Tripathi G, McGee KC, Al-Daghri NM, Al-Attas OS, Sabico S, O’Hare JP, Ceriello A, Saravanan P, et al. High fat intake leads to acute postprandial exposure to circulating endotoxin in type 2 diabetic subjects. Diabetes Care. 2012;35(2):375–82.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Wopereis S, Stroeve JHM, Stafleu A, Bakker GCM, Burggraaf J, Erk MJ, Pellis L, Boessen R, Kardinaal AAF, Ommen B. Multi-parameter comparison of a standardized mixed meal tolerance test in healthy and type 2 diabetic subjects: the phenflex challenge. Genes Nutrit. 2017;12(21):1–14.

    Google Scholar 

  3. Wojczynski MK, Glasser SP, Oberman A, Kabagambe EK, Hopkins PN, Tsai MY, Straka RJ, Ordovas JM, Arnett DK. High-fat meal effect on ldl, hdl, and vldl particle size and number in the genetics of lipid-lowering drugs and diet network (goldn): an interventional study. Lipids Health Dis. 2011;10(181):1–11.

    Google Scholar 

  4. Lépine G, Tremblay-Franco M, Bouder S, Dimina L, Fouillet H, Mariotti F, Polakof S. Investigating the postprandial metabolome after challenge tests to assess metabolic flexibility and dysregulations associated with cardiometabolic diseases. Nutrients. 2022;14(3):472.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Kumar AA, Satheesh G, Vijayakumar G, Chandran M, Prabhu PR, Simon L, Kutty VR, Kartha CC, Jaleel A. Postprandial metabolism is impaired in overweight normoglycemic young adults without family history of diabetes. Sci Rep. 2020;10:353.

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  6. Zeevi D, Korem T, Zmora N, Israeli D, Rothschild D, Weinberger A, Ben-Yacov O, Lador D, Avnit-Sagi T, Lotan-Pompan M, et al. Personalized nutrition by prediction of glycemic responses. Cell. 2015;163(5):1079–94.

    Article  CAS  PubMed  Google Scholar 

  7. Berry SE, Valdes AM, Drew DA, Asnicar F, Mazidi M, Wolf J, Capdevila J, Hadjigeorgiou G, Davies R, Al Khatib H, et al. Human postprandial responses to food and potential for precision nutrition. Nat Med. 2020;26(6):964–73.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Müllner E, Röhnisch HE, Von Brömssen C, Moazzami AA. Metabolomics analysis reveals altered metabolites in lean compared with obese adolescents and additional metabolic shifts associated with hyperinsulinaemia and insulin resistance in obese adolescents: A cross-sectional study. Metabolomics. 2021;17(1):1–13.

    Article  Google Scholar 

  9. Vis DJ, Westerhuis JA, Jacobs DM, Duynhoven JP, Wopereis S, Ommen B, Hendriks MM, Smilde AK. Analyzing metabolomics-based challenge tests. Metabolomics. 2015;11(1):50–63.

    Article  CAS  Google Scholar 

  10. Smilde AK, Jansen JJ, Hoefsloot HC, Lamers R-JA, Van Der Greef J, Timmerman ME. Anova-simultaneous component analysis (asca): a new tool for analyzing designed metabolomics data. Bioinformatics. 2005;21(13):3043–8.

    Article  CAS  PubMed  Google Scholar 

  11. Harrington P, Vieira NE, Espinoza J, Nien JK, Romero R, Yergey AL. Analysis of variance-principal component analysis: a soft tool for proteomic discovery. Anal Chim Acta. 2005;544(1–2):118–27.

    Article  CAS  Google Scholar 

  12. Thissen U, Wopereis S, Berg SA, Bobeldijk I, Kleemann R, Kooistra T, Dijk K, Ommen B, Smilde AK. Improving the analysis of designed studies by combining statistical modelling with study design information. BMC Bioinf. 2009;10(1):1–15.

    Article  Google Scholar 

  13. Thiel M, Feraud B, Govaerts B. ASCA+ and APCA+: extensions of ASCA and APCA in the analysis of unbalanced multifactorial designs. J Chemom. 2017;31(6):2895.

    Article  Google Scholar 

  14. Martin M, Govaerts B. LiMM-PCA: Combining ASCA+ and linear mixed models to analyse high-dimensional designed data. J Chemom. 2020;34(6):3232.

    Article  Google Scholar 

  15. Pellis L, van Erk MJ, van Ommen B, Bakker GCM, Hendriks HFJ, Cnubben NHP, Kleemann R, van Someren EP, Bobeldijk I, Rubingh CM, Wopereis S. Plasma metabolomics and proteomics profiling after a postprandial challenge reveal subtle diet effects on human metabolic status. Metabolomics. 2012;8:347–59.

    Article  CAS  PubMed  Google Scholar 

  16. Kolda TG, Bader BW. Tensor decompositions and applications. SIAM Rev. 2009;51(3):455–500.

    Article  ADS  MathSciNet  Google Scholar 

  17. Acar E, Yener B. Unsupervised multiway data analysis: a literature survey. IEEE Trans Knowl Data Eng. 2009;21(1):6–20.

    Article  Google Scholar 

  18. Smilde A, Bro R, Geladi P. Multi-Way Analysis: Applications in the Chemical Sciences. West Sussex: Wiley; 2004.

    Book  Google Scholar 

  19. Harshman RA. Foundations of the parafac procedure: Models and conditions for an “explanatory” multimodal factor analysis. UCLA working papers in phonetics 1970;16, 1–84.

  20. Carroll JD, Chang J-J. Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young’’ decomposition. Psychometrika. 1970;35(3):283–319.

    Article  Google Scholar 

  21. Kruskal JB. Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra Appl. 1977;18(2):95–138.

    Article  MathSciNet  Google Scholar 

  22. Bro R. PARAFAC. Tutorial and applications. Chemometrics and Intelligent Laboratory Systems 1997;38(2), 149–171.

  23. Martino C, Shenhav L, Marotz CA, Armstrong G, McDonald D, Vázquez-Baeza Y, Morton JT, Jiang L, Dominguez-Bello MG, Swafford AD, et al. Context-aware dimensionality reduction deconvolutes gut microbial community dynamics. Nat Biotechnol. 2021;39(2):165–8.

    Article  CAS  PubMed  Google Scholar 

  24. Gardlo A, Smilde AK, Hron K, Hrda M, Karlikova R, Friedeckỳ D, Adam T. Normalization techniques for PARAFAC modeling of urine metabolomic data. Metabolomics. 2016;12(12):1–13.

    CAS  Google Scholar 

  25. Li L, Hoefsloot H, Graaf AA, Acar E, Smilde AK. Exploring dynamic metabolomics data with multiway data analysis: a simulation study. BMC Bioinf. 2022;23(31):1–22.

    Google Scholar 

  26. Twisk J, Bosman L, Hoekstra T, Rijnhart J, Welten M, Heymans M. Different ways to estimate treatment effects in randomised controlled trials. Contemp Clin Trials Commun. 2018;10:80–5.

    Article  Google Scholar 

  27. Mattes RD. Oral fat exposure alters postprandial lipid metabolism in humans. Am J Clin Nutr. 1996;63(6):911–7.

    CAS  PubMed  Google Scholar 

  28. Wopereis S, Wolvers D, Erk M, Gribnau M, Kremer B, Dorsten FA, Boelsma E, Garczarek U, Cnubben N, Frenken L, et al. Assessment of inflammatory resilience in healthy subjects using dietary lipid and glucose challenges. BMC Med Genom. 2013;6:44.

    Article  Google Scholar 

  29. Kurata H. Virtual metabolic human dynamic model for pathological analysis and therapy design for diabetes. iScience. 2021;24(2): 102101.

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  30. Bisgaard H. The Copenhagen Prospective Study on Asthma in Childhood (COPSAC): design, rationale, and baseline data from a longitudinal birth cohort study. Ann Allergy Asthma Immunol. 2004;93(4):381–9.

    Article  PubMed  Google Scholar 

  31. Stroeve JH, Wietmarschen H, Kremer BH, Ommen B, Wopereis S. Phenotypic flexibility as a measure of health: the optimal nutritional stress response test. Genes Nutrit. 2015;10(3):1–21.

    CAS  Google Scholar 

  32. Hitchcock FL. The expression of a tensor or a polyadic as a sum of products. J Math Phys. 1927;6(1):164–89.

    Article  Google Scholar 

  33. Papalexakis EE, Faloutsos C, Sidiropoulos ND. Tensors for data mining and data fusion: models, applications, and scalable algorithms. ACM Trans Intell Syst Technol. 2016;8(2):1.

    Article  Google Scholar 

  34. Acar E, Bingol CA, Bingol H, Bro R, Yener B. Multiway analysis of epilepsy tensors. Bioinformatics. 2007;23(13):10–8.

    Article  Google Scholar 

  35. Williams AH, Kim TH, Wang F, Vyas S, Ryu SI, Shenoy KV, Schnitzer M, Kolda TG, Ganguli S. Unsupervised discovery of demixed, low-dimensional neural dynamics across multiple timescales through tensor component analysis. Neuron. 2018;98(6):1099–115.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Tomasi G, Bro R. Parafac and missing values. Chemom Intell Lab Syst. 2005;75(2):163–80.

    Article  CAS  Google Scholar 

  37. Acar E, Dunlavy DM, Kolda TG, Mørup M. Scalable tensor factorizations for incomplete data. Chemom Intell Lab Syst. 2011;106(1):41–56.

    Article  CAS  Google Scholar 

  38. Håstad J. Tensor rank is np-complete. J Algorithms. 1990;11(4):644–54.

    Article  MathSciNet  Google Scholar 

  39. Adali T, Kantar F, Akhonda MABS, Strother S, Calhoun VD, Acar E. Reproducibility in matrix and tensor decompositions: focus on model match, interpretability, and uniqueness. IEEE Signal Process Mag. 2022;39(4):8–24.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Bro R, Kiers HA. A new efficient method for determining the number of components in PARAFAC models. J Chemom. 2003;17(5):274–86.

    Article  CAS  Google Scholar 

  41. Harshman RA, De Sarbo WS. An application of PARAFAC to a small sample problem, demonstrating preprocessing, orthogonality constraints, and split-half diagnostic techniques in Research Methods for Multimode Data Analysis. New York: Praeger; 1984. p. 602–42.

    Google Scholar 

  42. Bro R, Smilde AK. Centering and scaling in component analysis. J Chemom. 2003;17(1):16–33.

    Article  CAS  Google Scholar 

  43. Acar E, Dunlavy DM, Kolda TG. A scalable optimization approach for fitting canonical tensor decompositions. J Chemom. 2011;25(2):67–86.

    Article  CAS  Google Scholar 

  44. Bader BW, Kolda TG, et al. Matlab Tensor Toolbox, Version 3.1. https://www.tensortoolbox.org

  45. Petersen KF, Dufour S, Savage DB, Bilz S, Solomon G, Yonemitsu S, Cline GW, Befroy D, Zemany L, Kahn BB, et al. The role of skeletal muscle insulin resistance in the pathogenesis of the metabolic syndrome. Proc Natl Acad Sci. 2007;104(31):12587–94.

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  46. Merz KE, Thurmond DC. Role of skeletal muscle in insulin resistance and glucose uptake. Compr Physiol. 2011;10(3):785–809.

    Google Scholar 

  47. Kahler A, Zimmermann M, Langhans W. Suppression of hepatic fatty acid oxidation and food intake in men. Nutrition. 1999;15(11–12):819–28.

    Article  CAS  PubMed  Google Scholar 

  48. Saito K, Maekawa K, Pappan KL, Urata M, Ishikawa M, Kumagai Y, Saito Y. Differences in metabolite profiles between blood matrices, ages, and sexes among caucasian individuals and their inter-individual variations. Metabolomics. 2014;10(3):402–13.

    Article  CAS  Google Scholar 

  49. Smilde AK, Werf MJ, Schaller J-P, Kistemaker C. Characterizing the precision of mass-spectrometry-based metabolic profiling platforms. Analyst. 2009;134(11):2281–5.

    Article  ADS  CAS  PubMed  Google Scholar 

  50. Vilozni D, Bentur L, Efrati O, Barak A, Szeinberg A, Shoseyov D, Yahav Y, Augarten A. Exercise challenge test in 3-to 6-year-old asthmatic children. Chest. 2007;132(2):497–503.

    Article  PubMed  Google Scholar 

  51. Harshman RA. Parafac2: Mathematical and technical notes. UCLA working papers in phonetics, 1972;22, 30–44

  52. Madsen KH, Churchill NW, Mørup M. Quantifying functional connectivity in multi-subject fMRI data using component models. Hum Brain Mapp. 2017;38:882–99.

    Article  PubMed  Google Scholar 

  53. Roald M, Bhinge S, Jia C, Calhoun V, Adali T, Acar E. Tracing network evolution using the PARAFAC2 model. In: ICASSP’20, 2020;1100–1104 .

  54. Acar E, Roald M, Hossain KM, Calhoun VD, Adali T. Tracing evolving networks using tensor factorizations vs. ica-based approaches. Front Neurosci 2022;16:861402.

  55. Bro R, Andersson CA, Kiers HA. Parafac2–part ii. modeling chromatographic data with retention time shifts. J Chemomet 1999;13:295–309.

  56. Jendoubi T, Ebbels TMD. Integrative analysis of time course metabolic data and biomarker discovery. BMC Bioinf. 2020;21:11.

    Article  Google Scholar 

Download references

Acknowledgements

We thank the children and families of the COPSAC\(_{2000}\) cohort for their contribution, and the clinical team at COPSAC for conducting the clinical testing.

Funding

The work presented in this article is supported by Research Council of Norway project 300489 (L. L, S. Y, M. A. R, A. K. S, E. A) and Novo Nordisk Foundation Grant NNF19OC0057934 (L. L, M. A. R, A. K. S, E. A).

Author information

Authors and Affiliations

Authors

Contributions

A. K. S and E. A formulated the research problem. B. C and M. A. R prepared for the real data. B. M. B, H. H and L. L prepared for the simulated data. L. L performed the data analysis. S. Y and E. A validated the analysis. L. L, E. A, A. K. S, B. M. B and H. H interpreted the simulated data. L. L, E. A, A. K. S, B. M. B, H. H and D. H performed the comparison between the real and simulated data. L. L, A. K. S and E. A prepared the original draft. All authors were involved in the preparation of the final manuscript.

Corresponding authors

Correspondence to Lu Li or Evrim Acar.

Ethics declarations

Ethics approval and consent to participate

All procedures performed in studies involving human participants were in accordance with the Declaration of Helsinki and was approved by The Copenhagen Ethics Committee (KF 01-289/96 and H-16039498) and The Danish Data Protection Agency (2015-41-3696). Written consent was obtained from the participants.

Consent for publication

Not applicable.

Competing interests

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be considered as a potential competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

. A list of parameters that are randomly perturbated to generate different individuals.

Additional file 2

. Supplementary figures and tables for the main manuscript.

Additional file 3

. Selection of the number of components in CP models.

Additional file 4

. Comparison of CP models for real vs. simulated data.

Additional file 5

. Experiments on data sets with random noise.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, L., Yan, S., Bakker, B.M. et al. Analyzing postprandial metabolomics data using multiway models: a simulation study. BMC Bioinformatics 25, 94 (2024). https://doi.org/10.1186/s12859-024-05686-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12859-024-05686-w

Keywords