figure b

Introduction

Diabetes now affects approximately 9.3% of the population worldwide [1]. Elevated prevalence is more evident in countries undergoing rapid nutrition transition, including China where estimated prevalence of diabetes and prediabetes in 2017 was 12.8% and 35.2%, respectively [2]. Compelling evidence shows that healthy dietary patterns and high physical activity can prevent or delay the onset of diabetes [3]. However, the complex effects of lifestyle modification on this disease remain to be elucidated. Recent advances in metabolomics, including lipidomics, provide a powerful tool with which to identify early biomarkers and specific metabolic disturbance(s) in the pathogenesis of diabetes, potentially facilitating the development of more precise prevention and therapeutic strategies.

Glycerophospholipids (GPLs) are the major lipids of cellular membranes, with phosphatidylcholines (PCs) and phosphatidylethanolamines (PEs) accounting for greater than 50% of the composition [4]. Accumulating evidence from animal studies indicate that disturbances of PCs and/or PEs, as well as their ratio, could contribute to a number of well-established risk factors of diabetes, including insulin resistance and glucose tolerance [5, 6], endoplasmic reticulum (ER) stress [7] and obesity [6]. However, only a handful prospective studies have investigated the associations between GPLs and risk of diabetes, with controversial findings from western populations [8,9,10,11]. For instance, the association between PC(36:4) and incident diabetes was positive in the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam study [9] but was negative in the Framingham Heart Study Offspring cohort [8]. To date, only one nested case–control study (n = 100 pairs) [12] has been conducted in a Chinese population and the diabetes-associated GPL varieties differed from those found in western studies [8,9,10,11]. Owing to the possibility that circulating GPL concentrations and their associated cardiometabolic diseases might vary among populations with different ethnic backgrounds and lifestyles, it is important to study the associations among Asian people, who have different genetic predisposition and dietary habits.

Previously, several intervention studies showed that a Mediterranean diet, low-glycaemic-index diet, or foods such as fish, dairy produce and soybean oil could alter circulating GPL profiles and/or concentrations of specific lipid metabolites [13,14,15,16,17]. A Mediterranean diet intervention reduced levels of PCs containing fatty acids with long chains (C16–20) and less double bonds, but increased PCs [13] containing fatty acids with very long chains (C20–22) and more double bonds. Moreover, existing clinical trials also reported that both aerobic and acute exercise could lower the PC/PE ratio and remodel skeletal muscle levels of PC and/or PE [18,19,20]. Nevertheless, little is known about whether, or to what degree, dietary factors or physical activity influence the associations between GPLs and diabetes.

By applying a targeted high-coverage lipidomics approach, the current study aimed to investigate the following: (1) the associations of 160 plasma GPLs with incident diabetes; (2) the relations between dietary patterns and physical activity and diabetes-associated GPLs; and (3) the potential modifying effects of dietary patterns and physical activity on the associations in a well-established Chinese cohort study.

Methods

Study population

The study was based on the population from the Nutrition and Health of Aging Population in China (NHAPC) study, a prospective study among community-dwelling Chinese individuals, aged 50–70 years, in Beijing and Shanghai. The details of the study have been previously reported [21]. Briefly, participants were recruited from Beijing and Shanghai (megacities representing the north and the south of China) by a multistage sampling method in 2005. In both Beijing and Shanghai, two urban districts and one rural district were chosen to represent people with high and low socioeconomic status based on the residential registration record. The eligibility of the candidates was defined as those who were stable residents for at least 20 years in the areas and were free from the following conditions: (1) severe psychological disorders, physical disabilities, cancer, CVD, Alzheimer’s disease or dementia, within 6 months; or (2) a current diagnosis of tuberculosis, acquired immune deficiency syndrome and other communicable diseases. In 2005, 3289 eligible participants (1458 men, 1831 women) were recruited and in 2011, 2529 participants completed a 6-year follow-up survey. Of these, plasma samples for lipidome analysis were available for 2248 participants. Finally, after further excluding 274 individuals with prevalent diabetes at baseline and 97 individuals with extreme total energy intake (<3347 or >16,736 kJ/day for men and <2092 or >14,644 kJ/day for women), 1877 participants were included in the current analysis (ESM Fig. 1).

The study protocol of baseline survey (grant no. E-2005-01) and 6-year follow-up survey (grant no. E-2011-12) were approved by the Institutional Review Board of the Institute for Nutritional Sciences, Chinese Academy of Sciences and abided by the Declaration of Helsinki principles. Written informed consent was obtained from all participants.

Data collection

At both baseline and 6-year follow-up visits, information on demographic variables, health status, lifestyle factors and medical history was obtained during a face-to-face interview by trained health professionals with standard questionnaires. Alcohol drinking was grouped into ‘yes’ or ‘no’ [21]. Family history of diabetes was defined as parent(s) or sibling(s) with diabetes. Physical activity was assessed by a modified International Physical Activity Questionnaire (short last 7 day format; www.physio-pedia.com/images/c/c7/Quidelines_for_interpreting_the_IPAQ.pdf), and the level for each individual was calculated as the sum of metabolic equivalent (MET)-min/week score. Dietary information was collected using a 74-item food-frequency questionnaire modified from a validated questionnaire used in the 2002 National Nutrition and Health Survey in China [22]. Food intake was adjusted for total energy intake by using the residual model. Food items (g/day) were classified into 18 groups for further analysis [22]. After fasting overnight, all participants were invited to undergo a physical examination. Body weight, height, waist circumference (WC) and BP were measured by trained medical professionals following a standard protocol. BMI was calculated as weight (kg) divided by the square of height (m2).

Laboratory measurements

After participants had fasted overnight, venous blood samples were collected in tubes containing EDTA as anticoagulant at baseline and follow-up surveys [21]. Blood samples were centrifuged at 2400 g for 15 min and stored at −80°C before analyses. Fasting blood glucose, HbA1c, insulin, total cholesterol (TCH), HDL-cholesterol (HDL-c), LDL-cholesterol (LDL-c) and triacylglycerols (TGs) were measured as previously described [23]. The HOMA-IR index was calculated as previously reported [24].

Erythrocyte fatty acid measurement

Baseline erythrocyte fatty acids were measured by GC coupled with positive chemical ionisation (Agilent 6890 N-5975B; Agilent Technologies, USA) [25]. Among 28 measured fatty acids, seven fatty acids in the de novo lipogenesis pathway (DNL) associated with carbohydrate intake [22], namely, myristic acid (14:0), palmitic acid (16:0), palmitoleic acid (16:1n-7), hexadecenoic acid (16:1n-9), stearic acid (18:0), vaccenic acid (18:1n-7) and oleic acid (18:1n-9), were included in the present analyses.

Lipidomics measurement

Baseline plasma lipid profiles were quantified by a targeted, high-coverage lipidomics approach constructed principally on liquid chromatography electrospray ionisation mass spectrometry (LC-ESI-MS/MS). Details on lipid extraction, chromatographic separation, MS analysis, data quantification and quality control processes are described elsewhere [24, 26]. Briefly, lipids were extracted from 10 μl plasma with a modified methyl tert-butyl ether protocol and then analysed by LC on a Shimadzu Nexera X2 LC-30AD system (Shimadzu Scientific Instruments, Japan) coupled with a Sciex 5500 QTRAP Triple Quadrupole Mass Spectrometer (Applied Biosystems/Sciex, Foster City, CA, USA). ACQUITY UPLC BEH HILIC Column (130 Å, 2.1 × 100 mm, 1.7 μm; Waters Corp Micromass UK, UK) was used for chromatographic separation, with the mobile phases for eluting lipids including 50:50 (vol./vol.) acetonitrile–water with 10 mmol/l ammonium acetate (A) and pure acetonitrile (B). Analyst 1.6.3 software (Applied Biosystems/Sciex) was applied for data acquisition in multiple reaction monitoring mode. Lipid species were quantified relatively according to their corresponding stable isotope-labelled standards. Plasma samples were analysed randomly and quality control samples were placed every ten samples to monitor the repeatability of the data. The specific transitions and experimental conditions of MS for analysing individual lipid species are presented in ESM Table 1. Finally, a total of 728 lipids were quantified, of which 160 GPLs (ten lysophosphatidylcholines [LPCs], one lysophosphatidylinositol, 54 PCs, 48 PEs, 14 alkylphosphatidylethanolamines [PE(O)s], 30 alkenylphosphatidylethanolamines [PE(P)s] and three phosphatidylserines [PSs]) were included in the current analyses after excluding lipids with missing rate >20% and/or CV >30%. Individual fatty acid moieties of GPLs at sn-1 and sn-2 positions were defined by their length (the first number) and degree of saturation (the second number), with the absence of a prefix implying an acyl linkage; the O and P prefixes indicate alkyl and vinyl linkages according to the LIPID MAPS consortium [27].

Definition of diabetes

Diabetes was defined by the following criteria, as previously described [28]: (1) fasting plasma glucose ≥7.0 mmol/l; (2) self-reported physician’s diagnosis of diabetes; or (3) taking any oral glucose-lowering medication or insulin.

Statistical analysis

Descriptive statistics for the study population were obtained by calculating mean ± SD or median (IQR) for continuous variables, and count (%) for categorical variables. Missing values for GPLs were imputed with half of the minimum detectable values, due to their concentrations being below the detection limit [29]. Spearman correlation coefficients (rs) among GPLs and of GPLs with cardiometabolic traits as well as erythrocyte fatty acids were calculated after adjustment for age, sex, region (Beijing or Shanghai) and residence (urban or rural). GPLs were log-transformed and scaled to SD of 1 before further analysis. Associations of total physical activity (MET-min/week) with GPLs were evaluated by linear regression, after adjustment for age, sex, region (Beijing or Shanghai), residence (urban or rural), education level (0–6 years, 7–9 years or ≥10 years), current smoking (yes or no), alcohol drinking (yes or no), family history of diabetes (yes or no), use of lipid-lowering medication (yes or no), and BMI. The levels of physical activity were categorised as low or high by the sex-specific total MET median [30]. Because of the high incidence of diabetes (26.6%) in our cohort [31], the RRs of developing diabetes were estimated by using log-Poisson regression models. Potential confounding variables in regression models were selected by directed acyclic graph (DAG), helping to elucidate the underlying causal structure among variables and to choose a minimal sufficient adjustment set of covariates [32], including age, sex, region (Beijing or Shanghai), residence (urban or rural), education level (0–6 years, 7–9 years, or ≥10 years), current smoking (yes or no), alcohol drinking (yes or no), physical activity (low or high), TG and HOMA-IR (ESM Fig. 2). In addition to the multivariable model, exploratory analyses were performed to include other conventional variables such as family history of diabetes (yes or no), use of lipid-lowering medication (yes or no), BMI, systolic BP and HDL-c; p values were corrected for multiple testing via the false-discovery rate (FDR) by using the Benjamini–Hochberg method [33]. Sensitivity analysis was performed by the inclusion of HbA1c ≥48 mmol/mol (6.5%) as a further criterion to define diabetes. Stratified analysis was conducted according to age (<60 or ≥60 years), sex (male or female), region (Beijing or Shanghai), residence (urban or rural), education level (0–6 years, 7–9 years or ≥10 years), smoking (yes or no), alcohol (yes or no), physical activity (low or high), and BMI (<24 kg/m2 or ≥24 kg/m2) at baseline. A likelihood ratio test was applied to examine the significance of interactions [34].

Weighted gene co-expression network analysis (WGCNA) was used to construct modules based on GPLs that were log-transformed and standardised to z scores before analysis (R package WGCNA version 1.51; https://cran.r-project.org/web/packages/WGCNA/index.html) [35]. Module eigengene (ME) derived from the first principal component of an identified module was representative of the module. Log-Poisson regression models were applied to evaluate the associations of lipid modules with risk of incident diabetes. The correlation networks were plotted in Cytoscape (v 3.7.1; https://cytoscape.org/release_notes_3_7_1.html).

For each GPL and module that were associated with incident diabetes, reduced rank regression (RRR) was performed along with the PLS (partial least squares regression) procedure in SAS v 9.2 (SAS Institute, Cary, NC, USA) to identify a dietary pattern based on 18 predefined food groups that could best explain its variation, following adjustment for age, sex, region and residence [36]. Before the analysis, GPLs and modules were normalised and scaled to SD of 1. Major foods constituting a given dietary pattern were defined as those with absolute values of factor loadings ≥0.20. The first factor obtained by RRR was representative of a dietary pattern score [36]. Stratified analysis of associations of GPL and/or module with diabetes was implemented based on levels of the dietary pattern score for corresponding GPLs or modules (≥ median value, or < median value). Distributions of GPLs and/or modules according to quartiles of macronutrient intake were compared by ANCOVA, with adjustment for age, sex, region and residence.

Analyses were performed with Statistical Analysis Software (SAS) (SAS Institute), SPSS version 25.0 (IBM Corporation, Armonk, NY, USA) and R version 3.4.4 (http://www.R-project.org). A two-sided p value <0.05 was considered statistically significant unless specified otherwise.

Results

Baseline characteristics of participants

During the 6 years of follow-up, 499 (26.6%) participants developed diabetes. Compared with individuals who did not develop diabetes, those who did develop diabetes were more likely to be Beijing residents and have a family history of diabetes. They also had higher baseline values for BMI, WC, BP, fasting glucose, HbA1c, fasting insulin, HOMA-IR, TCH, LDL-c and TG, and lower values of HDL-c (Table 1).

Table 1 Baseline characteristics of participants who did not develop diabetes (n = 1378) or did develop diabetes (n = 499) during 6 years of follow-up

Glycerophospholipids and incident diabetes

As shown in ESM Table 2 (Model 1), 73 of 160 GPLs were significantly associated with incident diabetes (RRs ranged from 0.89 to 1.28 per SD increment; FDR-corrected p < 0.05), after adjustment for age, sex, region and residence. When additionally adjusted for other covariates selected by DAG, including education level, current smoking, alcohol drinking, physical activity, TG and HOMA-IR, the associations remained significant for eight GPLs, namely LPC(20:3), PC(16:0/16:1, 16:0/18:1, 16:0/20:3, 18:0/16:1, 18:0/20:3, 18:1/20:3) and PE(16:0/16:1) (RRs ranged from 1.13 to 1.25 per SD increment; FDR-corrected p < 0.05; Table 2 and ESM Table 2, Model 2). In exploratory analyses, the eight aforementioned associations were unchanged when further controlled for other conventional variables, including family history of diabetes, use of lipid-lowering medication, BMI, systolic BP and HDL-c (FDR-corrected p < 0.05; Table 2 and ESM Table 2, Model 3). Of note, four of these eight GPLs contained saturated and monounsaturated fatty acyl chains. In sensitivity analysis, the associations remained similar when HbA1c ≥48 mmol/mol (6.5%) was further added to define diabetes (ESM Table 3). In the stratified analysis, most of the GPL–diabetes associations did not differ substantially according to age, sex, region, residence, education, smoking, alcohol, or BMI status (FDR-corrected pinteraction > 0.05; ESM Table 4).

Table 2 RRs (95% CIs) of diabetes after 6 years of follow-up according to quartile and per SD increment of GPLs (N = 1877)

The diabetes-associated GPLs were correlated with established diabetes biomarkers, particularly baseline TG (rs = 0.27–0.55), fasting glucose (rs = 0.17–0.30) and TCH (rs = 0.17–0.37 (all p < 0.001), after adjustment for age, sex, region and residence (ESM Fig. 3).

Network analysis

In the WGCNA analysis, nine modules based on the 160 plasma GPLs were identified (each indicated by a different colour in Fig. 1a). Generally, each subnetwork module contained GPLs within the same subclass, and with similar acyl chain length and number of double bonds at the sn-2 position. Of the nine modules, the blue module composed of most PCs (Fig. 1b) was positively associated with the risk of incident diabetes, with an RR (95% CI) of 1.16 (1.06, 1.26) per SD increment of the module score (FDR-corrected p < 0.05; Table 3). Similar to the results for individual GPLs, the association of the blue module with diabetes did not differ significantly between subgroups stratified by age, sex, region, residence, education, smoking, alcohol or BMI status (FDR-corrected pinteraction > 0.05; ESM Table 4). Moreover, the blue module was also moderately correlated with TG (rs = 0.51, p < 0.001), TCH (rs = 0.36, p < 0.001) and fasting glucose (rs = 0.22, p < 0.001) (Fig. 1c).

Fig. 1
figure 1

WGCNA analysis of GPL profiles. The nodes represent individual lipid species and the edges indicate the weighted correlation coefficients between each of lipid species. (a) A total of nine lipid subnetwork modules, indicated by different colours (black, blue, brown, pink, red, green, turquoise, magenta and yellow), were detected by topological overlap measure. (b) The blue module represents significant association with risk of incident diabetes. (c) Pairwise correlation heatmap of nine modules and metabolic traits with adjustment for age, sex, region and residence. Both values and colours within cells represent Spearman correlation coefficients (rs). DBP, diastolic BP; FI, fasting insulin; FPG, fasting glucose; SBP, systolic BP

Table 3 RRs (95% CIs) of diabetes after 6 years of follow-up according to quartile and per SD increment of MEs

Dietary factors, fatty acids in de novo lipogenesis pathway, glycerophospholipids and incident diabetes

Four of the eight significant GPLs, namely, PC(16:0/16:1), PC(16:0/18:1), PC(18:0/16:1) and PE(16:0/16:1), which contained saturated and monounsaturated fatty acyl chains, were positively correlated with carbohydrate intake and carbohydrate/fat ratio but negatively correlated with fat intake (FDR-corrected ptrend < 0.05; ESM Table 5). Moreover, all the significant GPLs and modules were moderately correlated with fatty acids in the DNL pathway, especially 16:1n-7 (rs = 0.35–0.62), and stearoyl-CoA desaturase activity reflected by 16:1n-7/16:0 ratio (rs = 0.33–0.59) (ESM Fig. 4).

In RRR analysis, the four diabetes-associated GPLs carrying DNL fatty acyl chains were correlated with dietary patterns characterised by high portions of refined grains (noodles and rice) but low portions of fish, dairy and soy products (|loading factors| ≥0.2; Fig. 2 and ESM Table 6). However, the other four significant GPLs that contained C20:3, namely, LPC(20:3), PC(16:0/20:3, 18:0/20:3, 18:1/20:3), as well as the blue module, were only correlated with low intake of fish, dairy or soy products (|loading factors| ≥0.2, p < 0.001; ESM Fig. 5). Nevertheless, all the significant associations between GPLs and risks of incident diabetes were generally consistent among participants with different levels of dietary pattern score (FDR-corrected pinteraction > 0.05; ESM Table 4).

Fig. 2
figure 2

Food loadings were derived by RRR as follows: PC(16:0/16:1) (a), PC(16:0/18:1) (b), PC(18:0/16:1) (c) and PE(16:0/16:1) (d). The x-axis represents food groups, and the y-axis suggests the loadings of corresponding food groups by RRR. Food groups with |factor loading| ≥0.20 (dashed line) were components of dietary patterns related to a given lipid species/module

Physical activity, GPLs and incident diabetes

Total physical activity (MET-min/week) was inversely associated with concentrations of the diabetes-associated GPLs/module, though only the associations for LPC(20:3), PC(16:0/20:3), PC(18:0/20:3) and PC(18:1/20:3) reached statistical significance (FDR-corrected p < 0.05; ESM Table 7). Notably, when the level of physical activity was stratified as low or high (<2226 vs ≥2226 MET-min/week in men; or <2079 vs ≥2079 MET-min/week in women), the aforementioned significant associations between GPLs/module and diabetes were primarily observed in participants with low, but not high, physical activity (all FDR-corrected pinteraction < 0.05; Table 4).

Table 4 RRs (95% CIs) of diabetes after 6 years of follow-up per SD increment of GPLs/MEs among subgroups stratified by physical activity status

Discussion

With high-coverage targeted lipidomics, eight GPLs (five novel and three reported previously), mainly PCs, were found to be positively associated with incident diabetes over a period of 6 years in a community-dwelling Chinese population. Four of the GLPs related to DNL were correlated to unhealthy dietary patterns, while the significant associations were only observed in those with a low, but not high, level of physical activity.

To the best of our knowledge, this is the first relatively large-scale Asian prospective cohort study investigating the associations between GPLs and incident diabetes. We identified three novel PCs (16:0/18:1, 18:0/16:1, 18:1/20:3) and confirmed three diabetes-associated PCs (16:0/16:1, 16:0/20:3, 18:0/20:3) previously reported by the EPIC-Potsdam and Malmö Diet and Cancer Cohort (MDC-CC) studies [9,10,11]. In the network analysis, the significant associations of individual PCs were further supported by the collective effects of the PC-containing module. Meanwhile, we also documented novel diabetes-associated LPC(20:3) and PE(16:0/16:1), somewhat similar to the positive associations of LPC(14:0) and PE score with diabetes incidence observed in the MDC-CC [11] and Prevención con Dieta Mediterránea study [37]. Since we measured HbA1c using frozen erythrocytes rather than fresh blood samples, as required by NGSP (http://www.ngsp.org/docs/methods.pdf), HbA1c ≥48 mmol/mol (6.5%) was not included as a diagnostic criterion for diabetes in the main analysis but was included in the sensitivity analysis to support the robustness of our findings. However, different GPL varieties, namely LPC(16:1), PE(P-18:0/20:4) and PC(34:3), were suggested to be significantly associated with diabetes incidence in a previous Chinese nested case–control study including 100 case–control pairs [12]. The discrepancies between that study and the current one, as well as western studies, could be ascribed to differences in study design, participant characteristics and analytical platforms. Notably, in the current study, almost all the diabetes-associated GPLs exclusively belonged to the PC subclass (six PCs out of eight GPLs). By contrast, PE was the predominant GPL subclass that showed a positive association with incidence of the metabolic syndrome [26] in the same cohort population. It is unclear whether there are preferable links of specific GPL subclasses with certain cardiometabolic outcome(s), although human studies showed associations between PCs or the PC/PE ratio with some established diabetes risk factors, such as obesity [38, 39] and insulin resistance [38]. In fact, it was demonstrated that suppressing PC biosynthesis via a diet deficient in choline or deleting PE N-methyltransferase could improve insulin resistance, glucose tolerance, fasting glucose, insulin and weight gain in high-fat-diet-fed mice [5, 6, 40], and may partially underpin the observed GPL–diabetes associations.

Notably, four of the diabetes-associated GPLs, namely PC(16:0/16:1), PC(16:0/18:1) PC(18:0/16:1) and PE(16:0/16:1), with saturated and monounsaturated fatty acyl chains were associated not only with erythrocyte fatty acids in the DNL pathway (particularly 16:1n-7 [rs = 0.35–0.62, p < 0.001]) but also with unhealthy dietary patterns comprising a high proportion of refined grains but low proportions of fish, dairy and soy products. Low fish intake in the EPIC-Potsdam study was also associated with monounsaturated PCs, such as PC(34:1), equivalent to PC(16:0/18:1) and PC(18:0/16:1) in our study [41]. Similarly, our prior study in the same cohort populations documented that diabetes-associated monounsaturated sphingolipids were significantly associated with DNL fatty acids [24]. Indeed, the unique DNL fatty acyl chains might reflect the abundant substrates for GPL biosynthesis, when DNL was upregulated by a high carbohydrate diet [9]. Previously, a trans-ethnic meta-analysis including four western and three Asian cohort studies demonstrated that Asians (Chinese and Japanese) with the highest white rice intake had a 55% higher risk of diabetes than those with lowest intake [42]. Moreover, our earlier study in the same cohort also showed that erythrocyte DNL fatty acids were associated with high carbohydrate/fat ratio (60.8%:27.0%) as well as elevated incidence of diabetes [43]. Although the underlying mechanisms linking high levels of DNL fatty acids with pathogenesis of diabetes are not well understood, animal model studies revealed that high levels of DNL fatty acids could be involved in ER stress, endothelial dysfunction, activation of an inflammatory response, and insulin resistance [44, 45]. Likewise, these mechanisms might also partially underpin our observed positive associations between GPLs with DNL fatty acyl chains and incidence of diabetes. Collectively, our findings suggested that specific structures in GPLs might reflect certain dietary exposures linking certain metabolic pathway(s) with diabetes risks.

It is worth noting that when physical activity was considered, the positive GPL–diabetes associations for the eight GPLs were only significant in the participants with low physical activity. Though all these eight GPLs were correlated inversely with physical activity, only the correlations for LPC(20:3), PC(16:0/20:3), PC(18:0/20:3) and PC(18:1/20:3), but not the four DNL PCs and PE, reached statistical significance. Consistent with our findings, an inverse association between PC(36:3), equivalent to PC(16:0/20:3) in our study, and physical activity was also indicated in the EPIC-Potsdam study [46]. However, it remains unclear whether the significant associations between the DNL GPLs and physical activity were masked by unfavourable effects of fatty acids in the DNL pathway among those consuming unhealthy dietary patterns. As a well-established prevention strategy for diabetes, physical activity has been shown to improve glucose and lipid metabolism and insulin sensitivity and to suppress adiposity [47,48,49]. Nevertheless, little is known about whether or to what degree physical activity could modify GPL metabolism and the associations with diabetes. Data from RCTs suggest that both aerobic and acute exercise can reduce the PC/PE ratio and remodel PC and/or PE in skeletal muscle, which consequently improves insulin sensitivity and whole-body glucose tolerance [18,19,20]. In addition, studies in knockout mouse models also revealed that a lower PC/PE ratio could enhance mitochondrial biogenesis, oxidative metabolism and insulin sensitivity [19]. Thus, our study supports the notion that physical activity can modify the GPL–diabetes associations, independent of BMI (see Table 3). Of course, further studies are warranted to confirm our findings and to illuminate underlying mechanisms.

Our study had the following strengths: (1) the associations between GPLs and risks of incident diabetes, and the modifying effects of lifestyle on the associations were investigated simultaneously; and (2) the broader spectrum of GPLs in the well-established cohort study allowed us to discover novel biomarkers and to explore comprehensively the relationships of GPLs with unhealthy lifestyles and associated metabolic pathways. Admittedly, our study also had some limitations. First, the findings from the middle-aged and elderly Chinese population may not be generalisable to other ethnic or younger populations. Second, physical activity and dietary intake was assessed by questionnaires, therefore measurement errors and/or recall bias could not be avoided. Third, given the observational nature of the study, we cannot fully rule out residual confounding, despite extensive adjustments and prospective study design.

In conclusion, the current study found that eight GPLs, particularly PCs correlated with DNL, were associated with high 6-year incidence of diabetes in a Chinese population. The unfavourable associations might be worsened in people with low physical activity. Further studies are warranted to validate our findings and address underlying mechanism(s).