Lifestyle patterns and incident type 2 diabetes in the Dutch lifelines cohort study

Highlights • Lifestyle factors clustered in behavioral patterns within the population.• Different lifestyle patterns were differentially associated with risk of developing type 2 diabetes.• A lifestyle pattern may be a proxy for an underlying variable that is relevant for the prevention of type 2 diabetes.


Introduction
Type 2 diabetes is a major public health challenge that leads to considerable morbidity, mortality, and economic burden (Sun et al., 2022). Lifestyle is crucial to the prevention of type 2 diabetes. Adherence to a combination of healthy lifestyle factorshealthy diet, avoiding smoking, vigorous physical activityis found to substantially lower the risk of developing type 2 diabetes (Duan et al., 2022;Farhadnejad et al., 2022;Zhang et al., 2020).
For studying the relationships between lifestyle factors and type 2 diabetes, a single lifestyle factor approach has been widely applied. Studies have also examined the combined effects of lifestyle factors, such as using an unweighted lifestyle score, but they do not take account of the distribution of the lifestyle factors in the population (Zhang et al., 2020). Prior studies have implicated that lifestyle factors often co-occur in behavioral patterns and may have interdependent effects on health (Davis et al., 2019;Ding et al., 2015;Hendryx et al., 2020;Hofstetter et al., 2014;Luo et al., 2021;Meader et al., 2016;Morris et al., 2016;Noble et al., 2015;Poortinga, 2007;van Etten et al., 2020;Watts et al., 2016). Better methodological approaches are therefore needed to understand the complexities of lifestyle factors and their associations with health.
For type 2 diabetes prevention, current evidence supports the relevance of targeting multiple lifestyle risk factors simultaneously (Meader Abbreviations: BIC-LL, Bayesian information criterion with log likelihood for the number of parameters adjusted; FFQ, Food frequency questionnaire; LCA, Latent class analysis; LLDS, Lifeline diet score; MVPA, Moderate-to-vigorous physical activity; PAF, Population attributable fraction; SQUASH, Short QUestionnaire to ASsess Health-enhancing physical activity. Noble et al., 2015;Tuomilehto et al., 2011). It is therefore essential to have a clear understanding of the clustering of lifestyle risk factors of the target populations. However, to date the knowledge basis is lacking. Specifically, only three studies have identified lifestyle patterns in the Dutch population, and only one of them further studied their associations with risk of type 2 diabetes (de Vries et al., 2008;Hofstetter et al., 2014;van Etten et al., 2020). There is considerably less knowledge about the relevance of lifestyle patterns for type 2 diabetes prevention in the general population.
Previous studies on lifestyle patterns mainly included smoking, alcohol consumption, physical activity level, and fruit and vegetable intake (Davis et al., 2019;de Vries et al., 2008;Hendryx et al., 2020;Hofstetter et al., 2014;Luo et al., 2021;Poortinga, 2007;van Etten et al., 2020). However, those identified lifestyle patterns may not fully represent the overall lifestyle risk profiles. While fruit and vegetable intake is an important indicator of diet (Halvorsen et al., 2021), overall diet quality, commonly assessed by diet scores, may better represent the overall dietary "risk profile" of the target populations (Vinke et al., 2018). Moreover, high TV watching time, as an emerging lifestyle risk factor representing sedentary behavior, has been found to be a risk factor for type 2 diabetes and mortality, independent of moderate-tovigorous physical activity (MVPA) (Patterson et al., 2018), while it has never been included in lifestyle pattern analysis. Therefore, incorporating overall diet quality and TV watching time in lifestyle pattern analysis will provide more information on the clinical relevance of lifestyle patterns.
Using a large Dutch population cohort, we aimed to reveal how lifestyle factors cluster within populations, i.e., the diverse lifestyle risk patterns of the population, and subsequently, to investigate the prospective associations between lifestyle patterns and incident type 2 diabetes. The analysis focused on four traditional and one emerging lifestyle factors, including overall diet quality (Duan et al., 2021;Maghsoudi et al., 2016;Vinke et al., 2018), physical activity (Aune et al., 2015), smoking (Pan et al., 2015), risk drinking (Knott et al., 2015), and TV watching time (Patterson et al., 2018). These lifestyle factors included are common in the general population. Having a clear understanding of how these common lifestyle factors cluster and how different lifestyle clusters affect type 2 diabetes risk will facilitate the design of effective prevention strategies at population level.

Study design and population
The Lifelines cohort study is a multidisciplinary prospective population-based cohort study that applies a unique three-generation design to study the health and health-related behaviors of 167,729 persons living in the north of The Netherlands. Before study entry, a signed informed consent form was obtained from each participant. The Lifelines study is conducted according to the principles of the Declaration of Helsinki and approved by the Medical Ethics Committee of the University Medical Center Groningen, The Netherlands. The overall design and rationale of the study have been described in detail elsewhere (Klijs et al., 2015;Scholtens et al., 2015).
Participants were included in the study between 2006 and 2013. So far, four assessment rounds took place, including baseline assessment (T1) and three follow-ups (T2-T4). Comprehensive physical examinations, biobanking, and questionnaires were conducted at T1 and T4. Follow-up questionnaires were issued to participants at T2, T3, and T4.
Participants aged between 35 and 65 years who were free of diabetes at baseline, and for whom lifestyle data was available were included in this study. Participants who had no follow-up data, or who reported the development of type 1 diabetes or gestational diabetes during follow-up were excluded. In total, 61,869 participants were included in the analysis (Supplementary Fig. S1).

Ascertainment of incident type 2 diabetes
Incident type 2 diabetes was assessed by self-report questionnaires during follow-up at T2, T3, and T4, as well as blood glucose and HbA 1c measurements at T4. Blood measurements are not available at T2 and T3. Participants were considered an incident case if they met one of the following criteria: (1) self-reported newly developed type 2 diabetes since last time they filled out a questionnaire; (2) fasting blood glucose ≥ 7.0 mmol/L; or (3) HbA 1c ≥ 48 mmol/mol (6.5 %) (World Health Organization (WHO) and International Diabetes Federation (IDF), 2006).

Clinical measurements
Blood samples were collected by venipuncture in a fasting state, and were further transferred to the Lifelines central laboratory for analysis. Serum levels of glucose and HbA 1c were subsequently analyzed. Anthropometry was measured by trained research staff following standardized protocols. These measurements were performed without shoes and heavy clothing. Family history of diabetes was assessed by selfadministered questionnaires. Participants were considered having a family history of diabetes if they reported having a first-degree relative (i.e., parent, sibling, or child) ever being diagnosed with type 2 diabetes.

Assessment of lifestyle factors and sociodemographic covariates
Age, smoking status, TV watching time per day, and education were assessed by self-administered questionnaires. Highest education achieved was categorized as: (1) lowjunior general secondary education or lower; (2) middlesecondary vocational education and senior general secondary education; and (3) highhigher vocational education or university.
Habitual physical activity level of a normal week was assessed by the Short QUestionnaire to ASsess Health-enhancing physical activity (SQUASH). The SQUASH was pre-structured into four domains: commuting, leisure time, household, and occupational activities. For each reported activity, frequency (days per week) and duration (average time per day) were asked. From the SQUASH data, non-occupational moderate-to-vigorous physical activity (MVPA), including commuting and sports (if ≥ 4.0 MET), was calculated in minutes per week. The SQUASH has been validated in the general population using objective accelerometer measurements for a 2-week period (Wendel-Vos et al., 2003).
Dietary intake was assessed by a semi-quantitative self-administered food frequency questionnaire (FFQ). The FFQ aimed to assess the habitual intake of 110 food items (including alcohol) during the past 4 weeks. For 46 main food items (such as bread and milk), frequency of consumption was indicated as 'not this month' or in days per week or month, including the amount (in units or specified portion sizes) consumed each time. The FFQ also included 37 questions on intake of sub-items (such as different types of cheese) for which frequency was specified as never, sometimes, often, and always. The FFQ was designed based on the validated Dutch FFQ (Molag et al., 2010). In brief, the intake of the food items and the energy intake have been tested and validated against three 24-h dietary recalls and actual energy intake in controlled feeding trials, respectively (Siebelink et al., 2011;Streppel et al., 2013). The Lifelines Diet Score (LLDS) was calculated to evaluate the relative diet quality of each participant (Vinke et al., 2018).

Lifestyle pattern analysis with latent class analysis
Lifestyle patterns were derived using latent class analysis (LCA). LCA is a latent variable mixture model that relates a set of observed indicators (i.e., lifestyle variables) to a set of latent variables (i.e., lifestyle pattern classes) (Hagenaars and McCutcheon, 2002). LCA enables the analysis and interpretation of higher-order interactions among lifestyle factors, which overcomes the issue of collinearity between lifestyle factors (Lanza and Rhoades, 2013;Rabe-Hesketh and Skrondal, 2008).
The LCA output mainly consists of two parts. The first part is the posterior class probability, which estimates the probability of an individual belonging to each latent class given the individual's observed response on the measured indicators. Each participant was assigned to the lifestyle pattern group for which they had the highest posterior class probability. A number of mutually exclusive lifestyle pattern groups would thus be identified. The second part is the class-specific response probability, which estimates the likelihood that an individual, who belongs to a particular latent class, adheres to a certain measured indicator, such as the probability of being a never smoker (Hagenaars and McCutcheon, 2002).
Since LCA requires that items are measured categorically, we further defined lifestyle factors into risky versus non-risky categories based on evidence, resulting in nine indicators. The interpretation of the results also becomes clearer when lifestyle factors are categorized into risky versus non-risky groups. Specifically, smoking status, i.e., never, former, and current smoker, was treated as three dummy variables. Alcohol intake was categorized as risk drinking (>15 g alcohol/day) versus nonrisk drinking (≤15 g alcohol/day) (Ding et al., 2021). This amount was approximated to one drink per day. TV watching time was categorized as excessive TV watching (highest sex-specific tertile) versus non-excessive TV watching (other tertiles). LLDS was divided into sex-specific tertiles. Physical activity level was categorized as whether the participant met the Dutch recommendation for physical activity level, i.e., ≥150 min non-occupational MVPA per week (Weggemans et al., 2018).
A series of latent class models were examined with three through nine classes. We selected the best-fitting latent class solution based on Bayesian information criterion with log likelihood for the number of parameters adjusted (BIC-LL). BIC-LL is a model goodness-of-fit index, for which a lower value is preferred (Nylund et al., 2007). We also considered other model goodness-of-fit indices (Supplementary Table S1) (Hagenaars and McCutcheon, 2002), as well as the interpretability of the identified lifestyle patterns. LCA was performed with LatentGOLD (version 5.0.0.14260; Statistical Innovations Inc., Belmont, MA, USA) (Vermunt and Magidson, 2005).

Risk of type 2 diabetes
Associations between lifestyle patterns and incident type 2 diabetes were estimated using Cox proportional hazards regression models. Nondiabetes cases were censored at the last time-point, for which data was available. Additionally, all participants were censored after 60 months. Analyses were adjusted in a stepwise manner for (1) age, sex, and total energy intake; (2) education; (3) BMI; (4) family history of diabetes; and (5) blood glucose level at baseline. Proportional hazards assumption was assessed by calculating the Schoenfeld residuals and by performing Cox regression models with time-dependent covariates. Potential effect modification was evaluated for age, sex, BMI, education, and family history of diabetes. Analyses were repeated excluding participants who had less than 12-month follow-up, in an attempt to address possible reverse causation caused by short follow-up time. For comparisons, we additionally tested the associations of incident type 2 diabetes with each lifestyle risk factor separately. Statistical analyses for calculating the risk of type 2 diabetes were performed on Stata (version 13.1; StataCorp, College Station, TX, USA).
To obtain insights into the lifestyle-related diabetes disease burden, namely the fraction of cases preventable if having a healthy lifestyle profile, we calculated the adjusted population attributable fraction (PAF) based on the odds ratios estimated using logistic regression models adjusting for the abovementioned Cox proportional hazards model covariates. The calculation of PAFs was performed using punaf package in Stata, as described by Newson (Newson, 2013).

Lifestyle patterns
After examining models with three through nine latent classes, we selected a 5-latent class model (five lifestyle patterns) since it offered the lowest BIC-LL value (best model fit) and the best subjective interpretability. Most of the other model goodness-of-fit indices also showed their best values at the 5-latent class model solution. Supplementary  Table S1 shows the detailed model goodness-of-fit indices for all models tested. Fig. 1 and Supplementary Table S2 show the estimated probabilities of adhering to lifestyle factors for lifestyle patterns identified. The first pattern was named the "healthy lifestyle group" (n = 27,413, 44.3 %), as it was characterized by moderate to low probabilities across all lifestyle risk factors. The second pattern was designated as the "poor diet and low physical activity group" (n = 13,846, 22.4 %), because it was characterized primarily by moderate to high probabilities of poor diet quality (lowest tertile of LLDS) and insufficient physical activity. The third pattern was labelled the "unhealthy lifestyle group" (n = 12,031, 19.5 %), since it was characterized by moderate to low probabilities of risk drinking and former smoker, but moderate to high probabilities across all other lifestyle risk factors. The fourth pattern was named the "couch potato group" (n = 4726, 7.6 %). Persons in this pattern had moderate to high probabilities of excessive TV watching and also notably former smoker, but they had moderate to low probabilities elsewhere. The fifth pattern was labelled the "risk drinker group" (n = 3853, 6.2 %), as persons in this pattern mainly had very high probability of risk drinking and moderate to high probability of former smoker.

Baseline characteristics
Baseline characteristics for each lifestyle pattern group are shown in Table 1. Participants from the "poor diet and low physical activity group" and the "unhealthy lifestyle group" tended to be younger, while participants from the latter group and the "couch potato group" tended to be less educated. In total, there were 59.6 % female participants included in the analysis, whereas there were more male participants (61.1 %) in the "risk drinker group". Clinical biomarkers showed diverse distributions among different groups. The "couch potato group" had the highest prevalence of family history of diabetes (10.2 %). Table 2 shows the associations between different lifestyle pattern groups and risks of incident type 2 diabetes. Among 61,869 participants included in the analysis, we identified 900 cases of type 2 diabetes during follow-up (205,696 person-years; median [interquartile] followup time, 41 [29-50] months; incidence rate 4.38 per 1000 personyears). The incidence rates of type 2 diabetes ranged from 3.51 per 1000 person-years for the "healthy lifestyle group" to 6.42 per 1000 person-years for the "unhealthy lifestyle group". In the fully adjusted model (model 5) using the "healthy lifestyle group" as the low risk reference group, the "risk drinker group" (HR 1.03 [95 %CI 0.77, 1.39]) and the "couch potato group" (HR 0.98 [95 %CI 0.76, 1.25]) were not associated with incident type 2 diabetes, whereas the "poor diet and low physical activity group" (HR 1.26 [95 %CI 1.03, 1.55]) and the "unhealthy lifestyle group" (HR 1.51 [95 %CI 1.24, 1.85]) had significantly higher risks of incident type 2 diabetes. Supplementary Table S3 shows the associations using the "unhealthy lifestyle group" as reference. Statistically, the associations between lifestyle pattern groups and risks of incident type 2 diabetes were not significantly modified by age, sex, BMI, education, and family history of diabetes (all p interaction > 0.05). Results were basically unchanged when excluding participants who had less than 12-month follow-up (Supplementary Table S4). Supplementary Table S5 presents the PAFs for each lifestyle pattern group using the "healthy lifestyle group" as reference. Supplementary  Table S6 shows the associations between single lifestyle factors and incident type 2 diabetes.

Discussion
There are two main findings of our study. First, using a large population-based sample, we identified five lifestyle patterns. Second, we found that different combinations of lifestyle risk factors, as manifested in lifestyle patterns, were differentially associated with risk of developing type 2 diabetes.

Lifestyle patterns and risk of incident type 2 diabetes
There is robust evidence showing that avoiding risky lifestyle behaviors is effective in the prevention of type 2 diabetes (Farhadnejad et al., 2022;Zhang et al., 2020). For example, an Iranian study found that a higher healthy lifestyle score, characterized by no smoking, normal body weight, vigorous physical activity, and healthy diet, was associated with up to 75 % lower risk of type 2 diabetes, independent of multiple confounders (Farhadnejad et al., 2022). The current analysis extends previous knowledge by considering multiple co-occurring lifestyle risk factors simultaneously in the form of real-life lifestyle patterns in the general population. We are aware of only two other studies that have applied a lifestyle pattern approach when predicting the risk of type 2 diabetes. One study from the US Women's Health Initiative cohort found that the "poor diet and low exercise pattern" and the "high multiple lifestyle and psychosocial risks pattern" were associated with higher risks of incident type 2 diabetes (Hendryx et al., 2020). Likewise, the Dutch HELIUS cohort study of a multi-ethnic population reported unhealthy lifestyle patterns were associated with higher risks of developing type 2 diabetes (van Etten et al., 2020). Despite the differences in risk factors and patterns considered that preclude direct comparisons between previous evidence and our results, taken together, these findings support an important role of lifestyle patterns in the development of type 2 diabetes.
The classic approach of studying single lifestyle factors usually assumes independent effects between each lifestyle factor, but does not account for their interrelations (Davis et al., 2019;Ding et al., 2015;Hendryx et al., 2020;Luo et al., 2021;Meader et al., 2016;Noble et al., 2015;van Etten et al., 2020). Although further investigation is warranted, we did observe that the risks related to different lifestyle patterns were neither additive nor proportionate to the number of risk factors present, especially compared with the effect sizes when studying each lifestyle factor separately (Supplementary Table S6). Notably, the "couch potato group" was not associated with risk of type 2 diabetes, especially after adjustment for BMI. This counterintuitive finding suggests that BMI may play an important role in the studied associations for participants from this lifestyle pattern group. As such, the average effects estimated for a single lifestyle risk factor may not be accurate for a substantial proportion of the study population. Alternatively, a lifestyle pattern may therefore be a proxy for an underlying behavioral variable that is not measured, but nevertheless relevant.

Methodological considerations
Our study was conducted in a single cohort, albeit large. Accordingly, the generalizability and reproducibility of the current lifestyle pattern analysis require further substantiation from independent cohorts. Various lifestyle patterns have been identified but in limited number of studies. At least partly, this is due to the heterogeneity of the source data, namely, numbers and categorization of lifestyle factors in different studies. Nevertheless, true differences in lifestyle patterns may exist between different populations. Analysis of differences and similarities in lifestyle patterns between populations would be highly relevant for identifying generic as well as specific patterns. So far, patterns primarily characterized by minimal risk behaviors, maximal risk behaviors, and poor diet combined with low physical activity were commonly identified. Patterns characterized by risk drinking generally showed large variations in its coexisting lifestyle risk factors across studies, which may be partly attributed to the lack of an evidence-based definition for that (Davis et al., 2019;Hendryx et al., 2020;Luo et al., 2021;Noble et al., 2015;van Etten et al., 2020;Watts et al., 2016). Using a normalized lifestyle evaluation scheme may therefore benefit the reproducibility and generalizability of the identified patterns to other populations.

Implications for public health prevention
In our analysis, participants from the "healthy lifestyle group" formed the largest group (44.3 %), although conspicuously their lifestyles were still not entirely optimal. Nevertheless, our analysis on lifestyle-related disease burden did show that substantial public health benefits could be obtained. For instance approximately one third of the diabetes cases in the "unhealthy lifestyle group" could be preventable, if participants in this group had the same lifestyle pattern as the "healthy   Table S5). Current evidence supports the relevance of targeting multiple lifestyle risk factors simultaneously (Meader et al., 2016;Noble et al., 2015). Although certain efforts in diabetes prevention have been made on improving diet quality and physical activity, other lifestyle risk factors and within-population heterogeneity in the distribution of lifestyle factors have often been overlooked (Kivela et al., 2020). As observed in our population, lifestyle factors may coexist with each other in a counterintuitive manner. The "couch potato group", characterized by excessive TV watching, also had the highest level of non-occupational MVPA. The differential risks found for each lifestyle pattern group also further emphasize the importance and relevance of considering different lifestyle patterns when designing lifestyle programs, rather than adopting the generic one-size-fits-all approach.

Strengths and limitations
Strengths of our study include a large sample size and the availability of data on TV watching time as an emerging lifestyle factor. Sensitivity analyses ensured the robustness of our findings. We exclusively studied lifestyle risk factors without conflation of lifestyle with its health outcomes (e.g., obesity status). However, a number of limitations are worth mentioning. First, over-reporting of healthy lifestyle behaviors due to social-desirability is possible (Newell et al., 1999). Nevertheless, in our study this over-reporting might mainly compromise the discrimination power of the identification of lifestyle clusters. Second, possible changes in lifestyle behaviors might be relevant but were not assessed. Third, as the Lifelines cohort mainly consists of participants in the northern Netherlands, it might not be possible to extrapolate our results to other population groups. Furthermore, in LCA analysis, the assignment of lifestyle pattern group for individuals was based on their highest posterior probability class membership, which unfortunately cannot account for the uncertainty of the classification. Finally, we could not analyze the potential impacts of lost to follow-up (23.0 %) among eligible participants. Nonetheless, the baseline characteristics of those who had no follow-up data were comparable with the study population, except for some minor differences (Supplementary Table S7). Simulation studies suggested that such attrition bias may only have limited influences on estimates of associations in cohort studies (Howe et al., 2013;Peters et al., 2012).

Conclusions
In conclusion, focusing on five lifestyle factors, namely smoking, overall diet quality, TV watching time, physical activity, and risk drinking, we identified five groups of individuals with different lifestyle patterns using a data-driven approach in a large population-based sample. These five lifestyle patterns were differentially associated with risk of developing type 2 diabetes. The clustering of lifestyle risk factors extends previous knowledge that those lifestyle factors tend to cluster, particularly in behavioral patterns within a general and heterogeneous population. Our findings pave the way for a more effective strategy for public health prevention for type 2 diabetes through targeting multiple lifestyle risk factors simultaneously.

Ethics approval
The Lifelines cohort study is conducted according to the principles of the Declaration of Helsinki and in accordance with the research code of the University Medical Center Groningen (approval number 2007/152). All participants received detailed information about the Lifelines cohort study and signed informed consent.

Data availability
The manuscript is based on the data from the Lifelines cohort study.
Lifelines adheres to standards for data availability. The data catalogue of the Lifelines cohort study is publicly accessible at https://www.lifelines. nl. All international researchers can obtain data at the Lifelines research office (research@lifelines.nl), for which a fee is required. The Lifelines research system allows access for reproducibility of the study results.

Funding
This project has received funding from the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement no. 754425. The Lifelines Biobank initiative has been made possible by subsidies from the Dutch Ministry of Health, Welfare and Sport, the Dutch Ministry of Economic Affairs, the University Medical Center Groningen (UMCG), University of Groningen, and the Provinces in the north of The Netherlands (Drenthe, Friesland, and Groningen). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
The authors do not have permission to share data.