Variations in sociodemographic and health-related factors are linked to distinct clusters of individuals with depression based on the PHQ-9 instrument NHANES 2007-2018

A B S T R A C T Background: Depression is a heterogeneous disease. Identification of latent depression subgroups and differential associations across these putative groups and sociodemographic and health-related factors might pave the way toward targeted treatment of individuals. Methods: We used model-based clustering to identify relevant subgroups of 2900 individuals with moderate to severe depression (defined as scores ≥ 10 on the PHQ-9 instrument) from the NHANES cross-sectional survey. We used ANOVA and chi-squared tests to assess associations between cluster membership and sociodemographics, health-related variables, and prescription medication use. Results: We identified six latent clusters of individuals, three based on depression severity and three differentially loaded by somatic and mental components of the PHQ-9. The Severe mental depression cluster had the most individuals with low education and income ( P < 0.05). We observed differences in the prevalence of numerous health conditions, with the Severe mental depression cluster showing the worst overall physical health. We observed marked differences between the clusters regarding prescription medication use: the Severe mental depression cluster had the highest use of cardiovascular and metabolic agents, while the Uniform severe depression cluster showed the highest use of central nervous system and psychotherapeutic agents. Limitations: Due to the cross-sectional design we cannot make conclusions about causal relationships. We used self-reported data. We did not have access to a replication cohort. Conclusions: We show that socioeconomic factors, somatic diseases, and prescription medication use are differentially associated with distinct and clinically relevant clusters of individuals with moderate to severe depression.


Introduction
Depression is one of the most common mental disorders, with ~8.4 % of the USA adult population exhibiting symptoms of major depression, albeit the prevalence is known to vary between age and racial groups and according to socioeconomic determinants (Fryers et al., 2003;Li et al., 2022;Lorant et al., 2003).The nine-question Patient Health Questionnaire 9 (PHQ-9) instrument (Kroenke et al., 2001;Levis et al., 2019) is a commonly used and accurate tool to screen for depression and it has been validated for primary care use in the USA (Cameron et al., 2008).
Depression is not a uniform disorder, and different individuals may experience various symptoms related to the disease.Identifying clinically meaningful subgroups of individuals with depression is an essential step toward personalized and effective treatment approaches (Bondar et al., 2020) and could have significant implications for clinical practice.For instance, precision medicine approaches could be developed to prioritize specific patient subgroups for alternative interventions, such as behavioral or pharmaceutical treatments.Furthermore, subgroups defined by different depression profiles might have varying risks of developing further mental (Hybels et al., 2013) or somatic (Hawkins et al., 2014) diseases, making it crucial to investigate potential subgroups.As clinicians aim to provide the best care for their patients, understanding depression subgroups' distinct characteristics can help clinicians tailor treatment plans that fit each patient's unique needs.
In the past decade, there have been several attempts to detect latent classes of individuals with depression based on a wide range of instruments, using various methods, largely identifying depression subtypes based on depression severity (Ulbricht et al., 2018;Van Loo et al., 2012).Thus far, inconsistent reporting of statistical methodologies, too homogeneous or too specific study samples, and unclear interpretation of results have hampered progress in this field, making it challenging for clinicians to apply findings in clinical practice (Beard et al., 2016;Elhai et al., 2012;Ulbricht et al., 2018).Therefore, we find it essential to investigate subgroups based on different symptomatology rather than severity to identify clinically relevant subtypes.We hypothesize that through the investigation of a large, diverse adult population with moderate to severe depression, we will be able to identify meaningful latent depression subgroups that go beyond clustering individuals according to depression severity.
We will use a model-based clustering approach in a large representative population from the USA to investigate latent clusters in adults with moderate, moderately severe, and severe depressive symptoms.We will further assess whether a wide range of sociodemographics, healthrelated factors, and prescription medication use are associated with the obtained depression clusters.We hypothesize that we will observe differences between these factors and the putative latent depression subgroups, and these differences might help explain the observed differences between depression clusters and thereby pave the way toward the targeted treatment of individuals in these groups.

Study population
The National Health and Nutrition Examination Survey (NHANES) is a repeated cross-sectional study that provides publicly available health data on randomly selected representative samples of the civilian, noninstitutionalized US population in two-year intervals (Center for Disease Control and Prevention, n.d.).Minorities and persons older than 60 years are oversampled.A wide range of data is collected including sociodemographic, disease history, prescription, and clinical laboratory data.A detailed description of the NHANES study is available elsewhere (Center for Disease Control and Prevention, n.d.).For the analyses presented in this paper, we included NHANES survey content between 2007 and 2018, sampled in six independent 2-year batches (2007-2008, 2009-2010, …, 2017-2018).A total of 59,842 individuals participated in NHANES between 2007 and2018, and57,414 participants were surveyed at a mobile examination center (MEC) and had non-zero MEC survey weights (we excluded 2428 individuals with a MEC weight of 0, which indicated that this person was not selected in the sample or did not attend the MEC examination, making this person non-eligible for analysis).In total, 31,623 individuals responded to at least one of the nine questions in the depression screening instrument, the Patient Health Questionnaire-9 (PHQ-9).The population of interest in our study was the adult US population who scored ≥10 on the PHQ-9 scale (exhibiting at least moderate depression); thus, the final sample used for our analysis consisted of 2900 individuals.The NHANES study was reviewed by the National Center for Health Statistics Research Ethics Review Board and approval was obtained bi-annually between each full proposal review.Informed consent was obtained from all participants.

Variables of interest
To investigate associations between depressive symptoms and other factors, we extracted sociodemographic and health-related variables of interest as well as data on prescription medication use from the NHANES database.These variables included: sex, marital status, educational attainment, ethnicity, income, family poverty index, smoking status, alcohol consumption, and physical activity status.We included dichotomous indicator variables regarding physical conditions; namely, whether participants have ever had the following diseases: asthma, arthritis, angina, stroke, heart failure, heart attack, coronary heart disease, diabetes, hypertension, hypercholesterolemia, liver conditions, thyroid conditions, bronchitis, and cancer or other malignancies.We also extracted the continuous variables age and body mass index.Prescription medication use (yes/no, and type of medication) was extracted for all individuals.In total, 17 classes of prescription medications were present in the dataset.Variable codings are described in Supplemental Text 1.

Statistical analysis
All analyses were undertaken using R 4.1.2(R Core Team, 2013).The missing data of the 57,414 included individuals (all those who had nonzero MEC weights) were imputed using multivariate imputation by chained equations (Van Buuren and Groothuis-Oudshoorn, 2011).For all variables, random forest was utilized (number of iterations = 5), and 15 imputed copies were generated according to the total fraction of missingness (13.8 %).The imputation models included the sampling weights to account for the complex survey design (Kim et al., 2006).Convergence was visually inspected for randomly selected datasets.The modes of the imputed responses to the individual PHQ-9 questions were identified across the 15 datasets and these values were used in all downstream analyses.
To identify clusters of the nine PHQ-9 questions, we used hierarchical clustering using the hclustvar() function of the ClustOfVar R package (Chavent et al., 2011).To identify clusters of individuals with depression, we utilized a model-based clustering algorithm based on maximum integrated complete-data likelihood, using the VarSelCluster() function implemented in the VarSerLCM R package (Marbac andSedki, 2017, 2019).The clustering of individuals was solely based on the nine PHQ-9 questions, coded as ordered factor variables (response option order: "0" < "1" < "2" < "3").We tried cluster numbers 1 to 10, and options with and without variable selection.In all instances, all nine variables were utilized, thus variable selection did not play a role.Cluster number (n = 6) was selected as the most optimal based on the best Bayesian Information Criterion (BIC) value (Supplemental Table 1).Probabilities for question responses in each cluster were visualized.Chisquared tests were conducted to assess the under and overrepresentation of response options for each question within the clusters by comparing the observed and expected counts.As each response option was compared with the expected counts in each cluster, in total, 216 pairwise chi-squared tests were undertaken.Thus, a Bonferroni corrected α = 0.05/216 = 2.3 × 10 − 4 was used as a threshold for statistical A. Gabarrell-Pascuet et al. significance in this step.
Associations between cluster membership and sociodemographics, health-related variables, and prescription medication use were assessed using one-way analysis of variance (ANOVA) tests and chi-squared tests.The Mobile Examination Center (MEC) subsample weights (Center for Disease Control and Prevention, n.d.) were used to correct for representativeness in statistical analyses.Descriptive statistics from the multiple imputation framework were pooled using Rubin's Rules (Rubin, 1987), and median P values from the ANOVA and chi-squared tests across the 15 imputed copies were reported, as previously suggested (Eekhout et al., 2017).The nine questions were coded as categorical variables and hierarchical clustering was undertaken using the hclustvar() function of the ClustOfVar R package using default settings.The tree was cut arbitrarily based on the authors' interpretation.

Results
The overall characteristics of the 2900 individuals included in the analyses are shown in Table 1.From the considered population of individuals scoring at least 10 on the PHQ-9 scale, 63.5 % were classified as moderately depressed, 26.2 % were moderately severe depressed, and 10.3 % were severely depressed.
Using hierarchical clustering, we visualized the dendrogram of the nine PHQ-9 questions (Fig. 1) and identified two major questions clusters primarily related to the somatic (Q3, Q4, Q5, Q7, Q8) and the mental (Q1, Q2, Q6, Q9) aspects of depression.We further observed two sub-clusters among the somatic question group related to behavioral changes, loss of concentration, and eating problems (Q5, Q7, Q8), and sleep and energy loss (Q3, Q4), and two sub-clusters among the mental question group related to guilt, sadness, and anhedonia (Q1, Q2, Q6) and suicidal thoughts (Q9).
The most optimal model yielded six latent clusters of individuals with distinct patterns of under-and overrepresentation of responses to the PHQ-9 questions (Supplementary Table 2).Three clusters, the Uniform moderate depression cluster (n = 590, 20.3 %), the Uniform moderately severe depression cluster (n = 436, 15.0 %), and the Uniform severe depression cluster (n = 363, 12.6 %) showed consistent overrepresentation of response options 1 (several days), 2 (more than half of the days), and 3 (nearly every day), respectively, to the vast majority of the nine questions (P bonferroni < 0.05).The Severe mental depression cluster (n = 268, 9.2 %) was characterized by an overrepresentation of response option 3 to all questions except for the somatic aspects of the PHQ-9 instrument related to tiredness, lack of energy, or poor sleep, and overeating or problems with appetite, where response option 0 (not at all) was overrepresented (P bonferroni < 0.05).The Severe somatic symptom profile cluster (n = 788, 27.2 %) showed an overrepresentation of response option 3 to the same three somatic questions mentioned above, and response option 0 was overrepresented for the rest of the questions (P bonferroni < 0.05).Last, the Moderate somatic depression cluster (n = 455, 15.7 %) showed a mixed profile with an overrepresentation of response option 2 to the three somatic questions mentioned above, and response option 1 to most of the other questions (P bonferroni < 0.05).The percentage of individuals with moderate, moderately severe, and severe depression across the six clusters are shown in Fig. 2.
Table 1 presents the sociodemographic characteristics of the total population and stratified by cluster membership.As indicated by the cluster analysis, both the established categories of level of depression (moderately, moderately severe, and severely depressed) (P < 0.001) and the continuous total PHQ-9 score (P < 0.001) showed marked differences between the six clusters.The Severe mental depression cluster had the most individuals with a low level of education (44.3 % without GED or college education vs. max.30.2 % in any other clusters, P < 0.001).This cluster also had the lowest income (P = 0.04), the highest proportions of Blacks and Hispanics (P = 0.001), and the most males across all clusters (50.4 % vs max.39 % in any other clusters, P < 0.001).No statistically significant differences were observed in poverty levels across the clusters (P > 0.05).
Table 2 presents the health-related characteristics of the total population and stratified by cluster membership.We observed no statistically significant association between cluster membership and BMI, asthma, angina, liver conditions, and hypercholesterolemia (P > 0.05).The Uniform severe depression cluster showed the highest percentages for heart attack and bronchitis, the Severe somatic symptom profile cluster showed the highest percentages for arthritis and heart failure, while the Severe mental depression cluster showed the highest percentages for coronary heart disease, stroke, cancer, diabetes, and hypertension (all P < 0.05).
Individuals in the Uniform severe depression cluster and the Severe mental depression cluster appeared to take the most medications (Fig. 3).Prescription medication use of 17 classes of agents across the six clusters is presented in Supplementary Table 3 and Supplemental Fig. 1.For most drug classes, we observed a statistically significant difference between clusters.The Uniform severe depression cluster and the Moderate somatic depression cluster showed the highest consumption of central nervous system-and psychotherapeutic agents, while the Severe mental depression cluster showed the highest consumption of cardiovascular and metabolic agents (all P < 0.001) (Fig. 4).

Fig. 2. Depression severity and cluster membership in the study population (N = 2900).
A) The weighted percentages of cluster membership across three groups defined by depression severity based on the total PHQ-9 score (moderate/moderately severe/ severe depression).B) The weighted percentages of depression severity across the six groups defined by cluster membership.

Discussion
Here we analyzed a multi-ethnic adult population of individuals with moderate to severe depression from the US-based NHANES study.We clustered the nine items of the PHQ-9 instrument to identify latent depression subgroups of this population.We identified six latent clusters; three of the six groups clustered the population according to severity.Two clusters, the Severe mental depression cluster and the Severe somatic symptom profile cluster delineated population subgroups according to the overrepresentation of response options "3 = nearly every day" to the mental aspects, and three of the somatic aspects of the PHQ-9 instrument related to tiredness, lack of energy, or poor sleep, and overeating or problems with appetite, respectively (Kroenke et al., 2001;Levis et al., 2019).The remaining cluster showed a mixed phenotype of individuals with moderate depression, with slightly more emphasized somatic symptoms.As the clustering of the nine PHQ-9 variables showed a clustering pattern that grouped aspects related to somatic and mental symptoms, and we identified a cluster of individuals that showed a distinct pattern of somatic and mental depression, our findings add further evidence of depression being a heterogeneous disease (Beard et al., 2016;Elhai et al., 2012;Krause et al., 2008;Krause et al., 2010;Richardson and Richards, 2008;Sunderland et al., 2013).Furthermore, our findings showing differential associations between a wide range of socioeconomic and health-related factors and prescription medication use suggest clinically meaningful clusters (Van Loo et al., 2012).We addressed several limitations reported in prior reports by utilizing a multi-ethnic, diverse, nationally representative population, by including a relatively large sample of individuals with moderate to severe depression, and by evaluating model fit using a statistically sound approach.Of note, previous studies have identified depression subgroups related to atypical, psychotic, and melancholic depression (Ulbricht et al., 2018;Van Loo et al., 2012), which could not be identified in our study as the PHQ-9 does not cover these DSM features.
After identifying six latent clusters, we compared sociodemographic and health-related factors, as well as prescription medication use among them.The Severe somatic symptom profile and the Moderate somatic depression cluster had the highest proportion of females, while the Severe mental depression cluster had the highest proportion of males; these general patterns have been observed in previous studies (Carragher et al., 2009;Silverstein et al., 2013).We observed differences in the proportions of ethnic groups across the depression clusters, with White participants being more prevalent in the somatic clusters.A study conducted in the US revealed that minorities with severe depression were less likely to access mental health services compared to Whites (Lee et al., 2014).In our study, Hispanics and Blacks were the most prevalent in the Severe mental depression cluster.While the Uniform severe depression cluster and the Severe somatic symptom profile cluster had an overrepresentation of severe response options to the somatic components of PHQ-9, it was the Severe mental depression cluster that showed the worst overall physical health.This cluster had higher percentages of individuals with coronary heart disease, stroke, diabetes, and hypertension compared to any of the other clusters.Additionally, this cluster had the highest consumption of metabolic and cardiovascular drugs.As expected, the Uniform severe depression cluster, with the highest overall average PHQ-9 score and percentage of individuals with moderately severe and severe depression, showed the highest consumption of central nervous system and psychotherapeutic agents.The Severe mental depression cluster, despite showing an overrepresentation of severe responses to all mental questions in the PHQ-9, had the lowest consumption of psychotherapeutic agents, indicating undertreatment, a recognized problem (Thornicroft et al., 2017).This phenomenon might also be partially attributed to this group having the highest proportion of Blacks and Hispanics, who have been reported to have less access to psychotropic drugs (Cook et al., 2014).The Uniform severe depression For this analysis, the population was narrowed down to those who report the use of at least one prescription medication.Weighted proportions and 95 % confidence intervals were calculated.
cluster, which showed an overrepresentation of severe responses to all questions, including the somatic ones, showed the highest percentage of individuals with heart attacks, highlighting a possible undertreatment of somatic conditions (lower consumption of metabolic and cardiovascular agents compared to most clusters), which has been reported previously (Fagiolini and Goracci, 2009).Elsewhere, prescription drug use has been associated with higher severity scores of depressive symptoms (Kendrick et al., 2009;Yang et al., 2021), and a study using an NHANES sample found that the use of prescription drugs with depression as an adverse effect was also associated with greater odds for concurrent depression (Qato et al., 2018).Of note, on average, >40 % of those at least moderately depressed (i.e., our study sample) used metabolic or cardiovascular agents, and >60 % used central nervous agents in this population, highlighting a high degree of polypharmacy in depression (Palapinyo et al., 2021;Yuruyen et al., 2016).
Previous studies have shown a robust association between depression and chronic cardiometabolic diseases (Cohen et al., 2015;Hare et al., 2014;Martin et al., 2016).However, our findings suggest that the relationship between physical and mental health is not uniform, instead, it differs among subgroups with distinct depression profiles.Specifically, we found that individuals with severe mental depression have a higher burden of cardiometabolic diseases and might be undertreated for their mental illness.We also hypothesize that individuals with a uniform expression of severe depression might be undertreated for their somatic symptoms.These insights showcase the potential of data-driven latent subgroup identification to glean clinically relevant information.It is possible that depressive symptoms on the mental axis of the PHQ-9 instrument (such as feeling bad about oneself, feeling down or hopeless, having little interest in doing things, and suicidal thoughts) could lead to the development of cardiometabolic diseases due to altered hormonal or metabolic regulation, and/or difficulty adopting healthy lifestyles (Gold et al., 2020;Qiao et al., 2022;Tang et al., 2020).Alternatively, chronic cardiometabolic diseases could cause depression to arise in this population (Gold et al., 2020).
We highlight that those individuals that score high on the somatic components of the PHQ-9 instrument but respond "0 = not at all" to all other questions might be misclassified as moderately depressed, where their responses merely reflect their symptoms related to their somatic disease and sleep disturbances (hence the Severe somatic symptom profile cluster being the only one which we termed without the word depression in it).As the PHQ-9 has been validated for clinical use and diagnosis in primary care (Cameron et al., 2008;González-Blanch et al., 2018), and previous studies have concluded that using the instrument's sum score is appropriate for evaluating depression severity (Stochl et al., 2022), categorizing individuals in the Severe somatic symptom profile cluster as moderately depressed raises concerns related to potentially unnecessary overtreatment.
While we are not able to establish directionality in our associations, previous evidence indicates that what we observe in our study might be the result of a bidirectional relationship where physical and mental illness exacerbate each other in a complex fashion (Egede, 2007;Hare et al., 2014;Jokela et al., 2019).Of note, a recent cohort study following individuals for 27 years identified associations between depression and a range of cardiometabolic diseases, however, these associations disappeared after the exclusion of individuals with somatic depression (Ditmars et al., 2021).As causal relationships cannot be determined using our study design, further studies are needed to disentangle the complex relationships between physical and mental conditions.Based on our results, we cautiously suggest a targeted assessment of individuals who score ≥ 10 on the PHQ-9 for patterns in their responses, and a thorough monitoring of somatic health status, including cardiometabolic diseases, in selected subgroups based on the response patterns.
A recently published study using NHANES data reported a strong negative association between depression (defined as PHQ-9 ≥ 10 vs. 0-9) and educational attainment (Li et al., 2022).Our study, focusing on those with at least moderate depression, shows more nuanced patterns; we observe that the Severe mental depression cluster has a higher proportion of individuals with a low level of education compared to the other clusters.Education, other socioeconomic factors, and health literacy are known to be associated with mental and chronic diseases on an individual level (Albert and Davia, 2011;Cantu et al., 2021;Silva et al., 2016); our findings confirm these patterns.Our results also highlight that while the association analysis of educational levels and the overall PHQ-9 score shows statistically significant findings in NHANES, our approach to investigate latent clusters of depression in the same population might be better suited to reveal particular data patterns to inform researchers and medical professionals about subgroups, and subsequently inform precision interventions.
A key strength of our study is the large, multi-ethnic adult population from the NHANES survey.Missing data was imputed in a multiple imputation framework, and all analyses were undertaken in 15 copies and subsequently pooled.We used the sample weights provided by NHANES and all analyses were undertaken in a weighted fashion, as recommended for studies with complex survey designs.We chose a clustering approach that could handle ordinal Likert-scale variables.We undertook a comprehensive association analysis between the identified clusters and sociodemographic and health-related variables and data on medication use.
We also declare some limitations.First, we used self-reported data, which is prone to various biases related to misreporting.Second, we used cross-sectional data, thus we were unable to make conclusions with regard to the causal direction of the observed differences between depression clusters and sociodemographic and health-related characteristics.Third, we lack a large multi-ethnic sample with similar data for replication purposes.
In conclusion, we confirmed our first hypothesis by being able to identify clinically meaningful latent clusters of individuals with moderate to severe depression, based on the PHQ-9 instrument.Our second hypothesis was also confirmed, as we identified several sociodemographic and health-related characteristics that differed between the identified clusters.Based on these findings, we argue for a further investigation of the complex interrelationship of physical and mental conditions.In addition, we show a promising example of the identification of clinically relevant subgroups, namely, latent clusters of individuals with moderate to severe depression who present with distinct combinations of mental and/or somatic symptoms of depression, with different comorbidity profiles.

CRediT authorship contribution statement
The corresponding authors attest that all listed authors meet the ICMJE authorship criteria and that no others meeting the criteria have been omitted.JDA and TVV performed the statistical analysis.TVV drafted the manuscript.JDA and TVV created the visualizations.TVV performed a literature review for the project.AGP, JDA, NHR, and TVV were responsible for the conceptualization of the project.TVV was responsible for project management.TVV is the guarantor of this manuscript.All listed authors interpreted the results, reviewed and edited the manuscript, verified the data, and approved the final, submitted version.

Funding
Aina Gabarrell-Pascuet's work is supported by the Secretariat of Universities and Research of the Generalitat de Catalunya and the European Social Fund (2021 FI_B 00839).Joan Domènech-Abella has a "Juan de la Cierva" research contract awarded by the Spanish Ministry of Science and Innovation (MCIU: FJC2019-038955-I).Tibor V. Varga is supported by the "Data Science Investigator -Emerging 2022" grant from the Novo Nordisk Foundation (NNF22OC0075284).

Fig. 1 .
Fig. 1.Hierarchical clustering of the nine PHQ-9 variables (N = 2900).The nine questions were coded as categorical variables and hierarchical clustering was undertaken using the hclustvar() function of the ClustOfVar R package using default settings.The tree was cut arbitrarily based on the authors' interpretation.

Fig. 4 .
Fig. 4. Prevalence of the use of four prescription medication groups among medication users in the six latent clusters of individuals with depression (N = 2080).For this analysis, the population was narrowed down to those who report the use of at least one prescription medication.Weighted proportions and 95 % confidence intervals were calculated.

Table 1
Descriptive characteristics in the total population and across clusters (n = 2900).
P values are calculated using chi-squared tests for categorical variables and ANOVAs for continuous variables.

Table 2
Lifestyle characteristics and health conditions in the total population and across clusters (n = 2900).