Characteristics associated with self-rated health in the CARDIA study: Contextualising health determinants by income group

An understanding of factors influencing health in socioeconomic groups is required to reduce health inequalities. This study investigated combinations of health determinants associated with self-rated health (SRH), and their relative importance, in income-based groups. Cross-sectional data from year 15 (2000 − 2001) of the CARDIA study (Coronary Artery Risk Development in Young Adults, USA) - 3648 men and women (mean 40 years) - were split into 5 income-based groups. SRH responses were categorized as ‘higher’/‘lower’. Health determinants (medical, lifestyle, and social factors, living conditions) associated with SRH in each group were analyzed using classification tree analysis (CTA). Income and SRH were positively associated (p < 0.05). Data suggested an income-based gradient for lifestyle/medical/social factors/living conditions. Profiles, and relative importance ranking, of multi-domain health determinants, in relation to SRH, differed by income group. The highest ranking variable for each income group was chronic burden-personal health problem (<$25,000); physical activity ($25–50,000; $50–75,000; $100,000 +); and cigarettes/day ($75–100,000). In lower income groups, more risk factors and chronic burden indicators were associated with SRH. Social support, control over life, optimism, and resources for paying for basics/medical care/health insurance were greater (%) with higher income. SRH is a multidimensional measure; CTA is useful for contextualizing risk factors in relation to health status. Findings suggest that for lower income groups, addressing contributors to chronic burden is important alongside lifestyle/medical factors. In a proportionate universalism context, in addition to differences in intensity of public health action across the socioeconomic gradient, differences in the type of interventions to improve SRH may also be important.


Introduction
The socioeconomic gradient in health is well recognized. Knowledge of differences in characteristics associated with good or poor health in socioeconomic groups is important to inform appropriate interventions, and improve health status across the gradient. Health status is a complex construct. The health implications of a single risk factor or exposure may not be universally identical; that is, health status would depend on interaction with coexisting variables, so that different combinations of risk and protective factors produce different outcomes. A solitary focus on single risk factors overlooks the combined impact of these multi-domain influences on health status (Marmot et al., 1998;Ostlin et al., 2005). The WHO Task Force on Research Priorities for Equity in Health called for research studying the "interrelationships between individual factors and social context that increase or decrease the likelihood of achieving and maintaining good health" (Ostlin et al., 2005).
SRH is a common measure of global health status, and an independent predictor of subsequent morbidity and mortality (CDCa,b,c, 2016;Idler and Benyamini, 1997;Moller et al., 1996;ONS). For high proportions of populations to report good SRH is in itself an important end point. Studies have identified independent determinants of SRH from diverse domains, including demographic, lifestyle, socio-environmental factors, and physical and mental health status; higher education and income are associated with better SRH status (Franks et al., 2003;Kunst et al., 2005;Mackenbach, 2005;Manderbacka et al., 1999;McFadden et al., 2008;Molarius et al., 2007;Shields, 2008;Shields and Shooshtari, 2001;Singh-Manoux et al., 2006). Adult SRH is also influenced by early-life factors (e.g. social circumstances at birth and school qualifications) (Power et al., 1998). The potential modifying effect of socioeconomic status (SES) on the relationship between objective health and SRH has been explored in earlier studies, with inconsistent findings (Delpierre et al., 2009(Delpierre et al., , 2012Dowd and Zajacova, 2010;Onadja et al., 2013;Singh-Manoux et al., 2007). There is also evidence suggesting SES does not modify the association between SRH and mortality, and that influence of health-related predictors is similar across socioeconomic groups (Burstrom and Fredlund, 2001;McFadden et al., 2009;Smith et al., 2010). Such inconsistencies may in fact result in an underestimation of health inequalities (Delpierre et al., 2009;Dowd and Zajacova, 2010;McFadden et al., 2009;Singh-Manoux et al., 2007). SES may affect expectations of health and risk, the factors considered in assessing subjective health, or their relative weighting. Socioeconomic circumstances can determine the range of factors pertinent to health; we explore this further, in the context of income, in the present study.
In addition to adverse childhood circumstances, a greater prevalence of adverse material circumstances, unhealthy behaviors and psychosocial factors are important in explaining health inequalities (van Lenthe et al., 2004). Lifestyle choices are rooted in socioeconomic context. In targeting factors such as physical exercise, smoking or alcohol consumption, there is value in understanding the concurrent upstream factors that might influence or restrict these choices (Marmot et al., 1998). Meyer et al., for example, found low SES linked to greater neighborhood safety concerns; these were negatively associated with physical activity, which in turn was negatively linked with mental health and SRH (Meyer et al., 2014). Thus, in low SES groups acting primarily on physical activity levels without addressing contextual factors which influence it, may not impact on health status. Mitigation of cumulative adverse effects requires a multi-level and multi-dimensional approach to intervention (Morello-Frosch et al., 2011;Wen et al., 2006).
Given the complexity of the socioeconomic gradient in health, Adler et al. discussed conceptual and methodological issues constraining earlier research on SES and health; one such issue is the limited ability of parametric multivariate regression to capture a large number of multi-domain interrelated variables, and fully unravel the mechanisms that might contribute to the gradient (Adler et al., 1994). Classification tree analysis (CTA), a form of recursive partitioning, provides an alternative approach with several advantages: this non-parametric technique is valuable for studying a complex set of predictor variables, and large sample size; it is data-adaptive, handles high dimensionality, a mixture of data types, and non-standard data structure, while providing insight into the predictive structure of the data (Breiman et al., 1984). Treebased methods have been used to partition individuals and establish high risk groups by clinical signs and symptoms (Kershaw et al., 2007); they may also uncover interactions potentially overlooked in logistic regression, unless modeled a priori (Forthofer and Bryant, 2000;Lemon et al., 2003;Nelson et al., 1998).
The aim of this study is to apply CTA to investigate combinations of multi-domain health determinants associated with self-rated health (SRH), and conduct an exploratory analysis of their combinations and relative importance in income-based groups. The factors considered represent multiple influences from the social-ecological model of health; a fundamental aim of the study was to contextualize these multi-domain factors, and study their potential joint impact and interactions. It is unclear whether the relative importance of risk factors associated with health status remains the same across income-based groups. We propose these would vary based on interaction of lifestyle choices, psychosocial factors, and living and working conditions, influenced by socioeconomic context. An understanding of these differences is important in planning interventions to improve health status, and reduce income-based health inequalities.

Methods
This study utilized cross-sectional data collected by the CARDIA longitudinal study (Coronary Artery Risk Development in Young Adults), started in 1985 with a cohort of 5115 men and women aged 18-30 years (1.1% of participants were 17-35 years), recruited in Birmingham, Alabama; Chicago, Illinois; Minneapolis, Minnesota; and Oakland, California. For this study, data were taken from the year 15 follow-up, conducted 2000-2001 (except race/ethnicity -1985-1986, family history -1995), through interviewer and self-administered questionnaires, to examine associations between SRH and many health determinants assessed for adults (mean age 40 years) in that year. From 5115 participants at baseline, 3672 were followed-up in year 15; participants with a response for SRH, coded as male or female, were included in the study sample of 3648 participants.

Study variables
Outcome, SRH, was assessed on a 5-point scale: "In general would you say your health is excellent, very good, good, fair or poor?" Responses of poor/fair/good were grouped as 'lower' SRH. Responses of very good/excellent were grouped as 'higher' SRH, as they were more definite positive statements of better health; respondents may have regarded a response of good, being the centre of a 5-point scale, as a neutral or 'average' value. This grouping also resulted in more equal group sizes. In a previous study, fair and good self-ratings of health were associated with higher mortality, so that risk was not associated solely with the poor group, but a gradient was observed. (Idler et al., 1990).
Predictor variables used in the analysis (appendix, Table 1A) represented multiple domains and a range of health determinants based on the social-ecological model of health: age, sex, and hereditary factors; individual lifestyle factors and medical history; social and community influences; living and working conditions (Gebbie et al., 2003;Dahlgren and Whitehead, 1991).

Statistical analyses
The study sample was split into 5 groups based on respondents' total family income. Mantel-Haenszel chi-square test for trend was used to assess the relationship between predictor variables and ordinal income categories. Continuous variables were analysed using the Kruskall-Wallis test.
For each income-based group, CTA was run using all predictor variables, excluding total family income, to segment the group into smaller mutually exclusive subgroups of individuals, and identify predictor variables associated with the outcome measure, SRH. At every node of the tree model formed in the analysis, the sample of individuals was split based on the predictor variable that maximised the goodness of split function, i.e. resulted in the largest decrease in impurity of the prior 'parent' node (a node that is split further into subgroups), in terms of distribution of SRH status. A ranking of predictor variables was based upon normalized importance, ranging from 0 to 100, with the variable with the greatest relative measure of importance scored at 100, and other variables scored in the range 0 to 100 (Breiman et al., 1984). Tree growing criteria were set to a minimum 'parent node' size of 20 (individuals) and 'child node' size of 10 (a child node is a subgroup formed from splitting of a parent node). Data were analysed using IBM SPSS Statistics (SPSS v21).
Distribution of study variables by income-based group (Table 1): SRH: Proportion of 'higher' SRH increased with income (p b 0.05). Sex, race/ethnicity and hereditary factors: There was an increasing proportion of males and whites, with higher income, and an inverse income gradient for proportion of respondents with family history of diabetes, stroke, maternal high blood pressure, maternal angina, and maternal heart attack. Higher income groups had higher age respondents (Table 2).
Lifestyle factors and medical history: Medical conditions showed a significant inverse income gradient, except for heart disease; high cholesterol increased with income. The chi-square test was significant (p b 0.05) for the relationship between physical activity and income. The lowest income group (b$25,000) had a larger proportion of respondents who rated themselves physically inactive compared to the highest  income group ($100,000 plus). Wine consumption increased with income; fast food consumption, beer/hard liquor consumption decreased with income (Table 2). Social and community influences: In higher income groups, larger fractions of respondents reported good social support and sense of neighborhood cohesion (Table 1).
Living and working conditions: Proportion of home ownership increased with income, and unemployment decreased. There was an inverse income gradient for proportion of respondents reporting difficulty in paying for basics and medical care, and not seeking medical care due to cost. The proportion of respondents with health insurance over the previous two years increased with higher income. Respondents in higher income groups were also more likely to be optimistic for the future, report a sense of control over life events, and be less likely to report feeling helpless in dealing with life problems (Table 1). There were significant associations between 3 indicators of chronic burden and income category by chi-square test for independence (p b 0.05) (chronic burden due to serious personal health problem; financial strain; or difficulties in a close relationship).

Classification tree analysis by income-based group
Under $25,000 ( Fig. 1): Lower SRH: 62.5% (n = 361). Three subgroups classed higher (55.6% to 87.5%), and 6 classed lower SRH (58.1% to 95.7%). Associated with SRH status, chronic burden due to serious personal health problem ranked highest. Family history of heart  disease; medical conditions (high BP, diabetes, heart disease, mental/ nervous/emotional disorder, and depression); all lifestyle factors; ability to rely on friends/family; and all chronic burden indicators were also related to SRH status. $25,000 to $50,000 (Fig. 2): Higher SRH: 50.5% (n = 460). Twelve subgroups classed lower SRH (highest 96.7%), and 6 classed higher SRH (largest 92.9%). Physical activity ranked highest in association with SRH. Nervous/emotional/mental disorder; chronic burden due to (a) serious personal health problem, (b) ongoing financial strain, and (c) close relationship difficulties; cigarettes/day; neighborhood support; social class discrimination getting housing, education; paternal diabetes, and alcohol were all related to SRH status.
$100,000 plus ( Fig. 5): Higher SRH: 77.0% (n = 614). Two subgroups had predominantly lower SRH (highest 70.4%) and 3 had predominantly higher SRH (largest 93.7%). Physical activity was ranked highest in association with SRH (Table 3). High blood pressure and chronic burden due to serious personal health problem were also related to SRH status.
All other multi-domain factors associated with SRH for each incomebased group are shown in Table 3, ranked by normalized importance.

Discussion
In this study, classification tree analysis reflects SRH as being a multifactorial measure. Consistent with previous studies, prevalence of higher SRH increased with higher income; there was an income gradient for several health determinants relating to lifestyle and medical factors, social and community influences, and living conditions. However, in addition, the results suggest a meaningful variation in the combinations and relative importance of these risk and protective factors associated with health status across the income gradient. Within each income-based group, we also identified smaller subgroups with similar SRH status but associated with differing combinations of risk factors (Figs. 1-5). Table 3 highlights the greater range of multi-domain factors linked to SRH status, with lower income. In the lowest income group, all 5 chronic burden indicators were associated with SRH; family history of heart disease, medical and lifestyle factors, and ability to rely on friends/family were also associated with SRH. In the $25-50k income group, 4 of the chronic burden indicators were associated with SRH; neighborhood factors and social support indicators were relevant too, in combination with medical and lifestyle factors, with physical activity highest ranked. For the 2 highest income categories, lifestyle factors (physical activity and smoking) were highest ranked, and there were fewer chronic burden indicators in the model. Based on these results, we suggest that taking account of these differences in risk factor profiles between groups would be important for health promotion; interventions need to be tailored to address priorities based on income or socioeconomic context, and relative importance of interacting multi-domain factors.
Chronic burden from a serious personal health problem was associated with SRH across all income groups with varying relative importance (highest ranked in the lowest income group). In the CARDIA study, this variable relates to the experience of strains over 6 months, and so reflects not simply presence of a medical condition, but associated burden or stress; this may be due to medical costs, access to services, Table 3 Combinations of health determinants and variable ranking associated with self-rated health by income-based group, the CARDIA study, year 15, USA. lack of social support in dealing with illness, and the impact on daily life. In populations with illness or disability, resources such as a sense of high mastery, greater self-esteem or social support have been shown to associate with better SRH (Bosworth et al., 1999;Cott et al., 1999). Social support, positive social relationships, optimistic outlook on life, perceived control over life outcomes, and sense of purpose and direction in life, have been identified as health protective factors; psychosocial factors also influence positive health behaviors (WHO, 2002). In the study sample, lower income groups had additional sources of chronic burden (e.g. from financial strain; difficulties in close relationships; and job or ability to work) associated with SRH. Higher income groups had higher prevalence of individuals reporting: good social support and sense of neighborhood cohesion; availability of health insurance, resources for basics and medical care; optimism; sense of control; and a lower prevalence of feeling helpless in dealing with life problems. This suggests a possible protective or buffering role of these factors from the associated burden or stress of health problems.

Normalized independent variable importance (%) Variable
Limitations to the present study include the cross-sectional study sample, the self-reported nature of predictor variables, household income unadjusted for occupancy, and potential of health selection in use of income as the marker of SES. However, as choice of socioeconomic indicator depends on the postulated mechanisms by which it affects health (Lynch and Kaplan, 2000), the use of income is justified, as the focus is on SES in terms of disparities in material resources, and access to resources. Higher versus lower SRH status is compared in this initial study, though further detail, and more complex tree models, would be produced by maintaining 5 response categories for SRH status. Classifying responses of good, fair and poor into one category may be viewed as a limitation if health determinants associated with good SRH are more similar to very good/excellent health in this sample. It is not possible to make further assumptions regarding importance of individual ranked factors based on a single tree classification, and no inference exists for the difference of the variable importance ranking between income groups. This is an exploratory data analysis, and further research is needed to confirm the importance of these factors. However, for each income-based group, the normalization is used to obtain comparable measures of importance that are scale-free, and thus allow a sensible ranking of the predictor variables. This is in order to get an idea of relative importance of the factor in relation to the outcome; this has relevance for prioritizing and targeting action to improve health status.
A broad range of variables is included in the analysis as SRH is influenced by several different factors, and the analytic method used can handle this set of predictors. Some variables that are correlated could be eliminated further by additional investigation, though correlation could also vary by income-based subgroup. Even so, the analytic approach used here is a data-adaptive method that is particularly useful in such cases, when there are a large number of potential explanatory factors, also allowing the most important set of predictors among a large set of candidates to be identified. Recognizing that these are exploratory analyses, it may be inferred that in considering the design of surveys or interventions related to self-reported health, priority could be given to consideration of factors impacting chronic burden related to serious personal health problem (e.g., access to medical care, medical insurance, and lack of support) in low-income individuals, since this is the top node in the tree model. For upper-income individuals, priority could be given to consideration of factors affecting physical activity since this is the top node in the tree model for upper-income individuals.
Contextualising risk factors, by considering the clustering and joint impact of different health determinants, is important for action on the "fundamental factors that put people at risk of risks" (Link and Phelan, 1995). This concept is analogous to the framework used by infectious disease epidemiologists, which considers the agent, host, and susceptibility, or environment. Differential impacts on health occur also because low-income groups are more likely to be exposed to multiple hazardous risk factors simultaneously, and a convergence of environmental hazards and social stressors, which contribute to poor health outcomes, and inequalities (Dahlgren and Whitehead, 2007;Marmot et al., 1998;Morello-Frosch et al., 2011). High-income groups have more resources providing protection from disease and its consequences (Link and Phelan, 1995). Therefore, a health promotion intervention appropriate for one group is not necessarily optimal for another.
Factors such as physical activity and smoking were important for all income groups in our study. To improve health status for lower income groups, also addressing factors contributing to chronic burden, mental illness, neighborhood problems, and lack of social support may be beneficial, and in turn have an effect on lifestyle factors. The analysis unifies multidomain and multilevel factors; the results imply the need for collaborative approaches to improve SRH. Both policy and populationlevel public health interventions are required to address upstream health determinants, alongside medical and health promotion efforts tackling individual medical or lifestyle factors. Friel et al. (2005) applied CTA, to study profiles (socio-demographic, socioeconomic factors and health-related lifestyle behaviors) of adults complying with fruit and vegetable dietary recommendations, based on the rationale that food choice is complex and influenced by economic, social and environmental context. Relative importance of social characteristics in predicting fruit and vegetable consumption differed by gender; these results were considered to have implications for setting dietary strategies and policy. BeLue et al. applied classification and regression tree analysis to study obesity-related risk profiles in a sample of US adolescents (BeLue et al., 2009). Obesity-related risk and protective factors differed among sociodemographic groups, and in their relative importance to adolescent overweight status. These results, and those from the present study, demonstrate that in terms of public health intervention, often "one size does not fit all"; improving health requires a multi-level and multi-dimensional approach to intervention, which regards the complexity and diversity of risk factor profiles in different subgroups (BeLue et al., 2009;Morello-Frosch et al., 2011;Wen et al., 2006).
Adler and Stewart's review discussed the eras though which research on SES and health has progressed (Adler and Stewart, 2010). The first model of a threshold effect between poverty and health was refined following evidence of a graded association. Subsequent eras studied mechanisms linking SES and health, and considered multilevel influences, and most recently, interactions among factors. This study utilizes a CTA approach to investigating the factors associated with SRH in income-based groups, and adds to the work on the joint impact of health determinants on SRH. Multivariate logistic regression models may demonstrate only the average relationship between predictor and outcome over the population. CTA is a useful segmentation technique to suggest population subgroups that might have homogenous risks of an outcome, and identify the relative importance of associated risk and protective factors for further inquiry (Forthofer and Bryant, 2000).
The Marmot report, Fair Society Healthy Lives (on evidence-based strategies for reducing health inequalities in England) recommended proportionate universalism to reduce the steepness of the socioeconomic gradient in health (Marmot, 2010). This requires a universal approach to public health action but with a scale and intensity proportionate to level of disadvantage. Our study results suggest potentially important differences in factors associated with SRH among incomebased groups. In the context of a proportionate universalism approach to reducing health inequalities, the findings imply that as well as differences in the intensity of public health action required the gradient, differences in the type of actions to improve SRH may also be important.

Human subjects statement
The Coronary Artery Risk Development in Young Adults (CARDIA) study was approved by the institutional review boards at all study sites including Northwestern University, University of Alabama at Birmingham, University of Minnesota, and Kaiser Permanente. Study approval was granted by the Committee for Protection of Human Subjects at the University of California, Berkeley.

Contributors
SN devised the original idea for the study, conducted data analysis and wrote the preliminary draft of the paper. SLS contributed to study conception. AH contributed to data analysis, and AH and SS to interpretation of data. All authors developed and revised the manuscript and approved the final version.