Introduction

A fundamental component of both descriptive and analytic epidemiologic inquiry is a basic description of individual-level characteristics of the population under study. Identifying patterns of risk factor and disease distributions by sociodemographic factors such as sex, gender, race, ethnicity, and socioeconomic position (SEP) can be critical for the targeted distribution of limited resources.1,2,3,4 Understanding the effects of individual sociodemographic factors on health outcomes has been invaluable in elucidating intervention opportunities for both risk factors and diseases.5,6,7,8,9 Although nearly every published epidemiologic and public health study in the past several decades has included sociodemographic information in both descriptions and analytic models, there are few studies specifically focused on child health outcomes addressing issues with the definitions, contextualization, measurement, or appropriate use of these variables.10,11,12,13,14,15,16,17

Inclusion of sociodemographic variables in child health studies can be particularly complex and problematic.6 The measurement and contextualization of these factors in a child are heavily influenced by the individual characteristics of the child’s parents, siblings, caregivers, and others.18,19 In addition, the influences of exposures on outcomes in children are affected by stages of growth and development; consideration of the child’s life stage within his/her life course is an important part of child health inquiry.20,21 Finally, young children spend a majority of their time in the home, followed by a time period of significant amounts of time spent in and around school, as well as in their neighborhood. These significant changes in a child’s physical environment are accompanied by differential sociodemographic contexts that can influence a child’s health and well-being.22,23,24,25 Achieving consensus and consistency in the characterization of a child’s sociodemographic environment has been a challenge in child health literature.

For the past nearly 20 years, the National Institutes of Health (NIH) has required that large studies (both cohort studies as well as clinical trials) need to make data available for use by outside investigators.26 In addition, in this era of “Big Data,” researchers often use secondary data made available by agencies both within and outside the health sector.27 While researchers often cannot control how sociodemographic constructs within these datasets were defined or collected, they still bear considerable responsibility in demonstrating the appropriate contextualization of these factors in the scholarly research that they put forth.

In this paper, we aim to summarize the major issues that have been noted to date with the use of the core and most frequently used sociodemographic variables, including sex, gender, race, ethnicity, and SEP in child health research, and put forth recommendations on how researchers can advance this field. We discuss the importance of establishing a conceptual framework for how sociodemographic factors influence the exposure(s) and outcome(s) in question, and of using this framework to decide if and how those variables should be included in an analysis. We then review the current knowledge related to the definition, analysis, and interpretation of those sociodemographic constructs. Finally, we put forth recommendations (Table 1) of how researchers can play a role in filling long-standing gaps in our understanding of how to best incorporate sociodemographic constructs in child health research.

Table 1 Summary of recommendations for inclusion of sociodemographic factors in child health studies.

Conceptual frameworks

There are well-documented associations between sociodemographic characteristics and health outcomes. Race, ethnicity, gender, and socioeconomic status are well-studied examples. However, less is known about why these factors are salient for health or how these factors work together to frame health.13,28 Using conceptual frameworks can elucidate the role of sociodemographic factors in analyses of child health outcomes. Conceptual frameworks guide what variables are included in analytic models, and how these variables are included. Conceptual frameworks illustrate the theorized relationships between the sociodemographic and other variables in an analysis. They summarize a researcher’s understanding of previous knowledge and application of theory as it relates to a specific topic area or research questions. An effective conceptual framework conveys the scope, levels, and key constructs of interest. They are useful tools for visually communicating complex areas of research and guide analytical decision-making. In practice, conceptual frameworks vary from broad, all-encompassing visualizations of entire research fields or theoretical frameworks to specific illustrations of a single research project.

Unfortunately, conceptual frameworks are often not explicitly stated in epidemiologic research, and sociodemographic variables are often “adjusted for” without a thought about their role and relationship to other variables.29,30 Factors such as race, ethnicity, sex, and SEP are often included without consideration of whether these factors are confounders or decedents of confounders, mediators, or moderators. Each of these types of variables requires different analytic treatment which may spuriously affect inferences if not handled correctly.31,32 For example, from a conceptual standpoint, adjusting for, or holding everything equal, except for one’s self-reported race becomes meaningless in a racialized society in which race shapes access to all aspects of a person’s life and health.33,34 From a statistical perspective, including a binary indicator of race in a regression analysis does not allow researchers to make inferences about differences in the exposure–outcome relationship across racial groups, illustrating the need for careful consideration of the research question, analysis, and interpretation.35,36,37

Sex and gender

The terms “sex” and “gender” refer to two distinct but interrelated concepts that are recognized globally as the core social determinants of health across a wide variety of geographic settings.38,39 There has been fairly extensive discussions in the literature aiming to clarify these as two distinct concepts and calling for the need to standardize the use of these terms.40,41,42,43,44 Traditionally in the literature, the terms sex and gender have been conflated to represent a combined construct of biological characteristics and cultural expression, and the categorization has been binary despite evidence that there are more than two sexes.45,46 However, by definition, sex refers to the set of biological attributes in humans and animals that are associated with specific physical and physiological features, including relevant chromosomes, gene expression, hormone function, and reproductive/sexual anatomy.39 Gender refers to the set of cultural meanings ascribed to or associated with patterns of behavior, experience, and personality that are labeled as feminine or masculine.47 Sex and gender have separate although often interactive and synergistic effects on health, illness, well-being, and experience with the health care system.44 Therefore, they are inappropriate proxies for one another, and should be measured and analyzed distinctly.48 It is also important to note that sex and gender are different from conceptualization of sexual orientation or sexual attraction. Although the measurement and analysis of sexual orientation, which describes romantic or sexual attraction, is not discussed in this paper, it is important to consider within the context of child health research.

Many experts have recommended that in accordance with its strictest definition, sex is conceptualized as a binary factor, and the terms “male” and “female” should be used in its description.40 However, this has been called into question as more research has been done into intersex conditions, grouped under the term Disorders of Sex Development, which have a prevalence of up to 1 per 100 persons.49 There are more than 20 conditions included in the categorization of Disorders of Sex Development, the more commonly known of which include Congenital Adrenal Hyperplasia, Gonadal Dysgenesis, Androgen Insensitivity Syndrome, Turner Syndrome, and Kleinfelter Syndrome. These individuals would not fit into a binary definition of sex, and therefore sex likely requires a third category in research work; studies that only capture sex as “male” or “female” would fail to accurately capture or represent intersex persons.50

From a methodologic standpoint, there have been some published recommendations regarding the inclusion of sex in the analysis of research findings.46,51 In 2016, National Institutes of Health (NIH) released the Sex as a Biological Variable (SABV) policy, which states that “NIH expects that SABV will be factored into research designs, analyses, and reporting in vertebrate animal and human studies.”45 The SABV policy suggests that regardless of whether the study was powered to detect sex differences, data should be disaggregated to explore any differences that could be obscured when data from males and females are pooled, and therefore that key relationships between the exposure and outcome should be analyzed for males and females separately.45 Researchers have noted that when sex is included in models in most of the epidemiologic literature, it is for the most part treated as a confounder, thus neglecting its potential role as an effect measure modifier.52,53,54 Importantly, investigators should use their conceptual framework to determine what about sex differences is important in the analysis. If there are underlying characteristics that traditionally differ by sex, then those should be measured and analyzed directly without using sex as a loose proxy.55

In contrast, gender is more commonly recognized as a multidimensional construct that includes gender identity, gender expression, and gender label (applying a name and definition to one’s gender identity and expression).10 Gender identity, according to the Institute of Medicine’s 2011 report, “refers to a person’s basic sense of being a man or boy, a woman or girl, or another gender (e.g., transgender, bigender, or gender queer—a rejection of the traditional binary classification of gender).”47 Gender expression “denotes the manifestation of characteristics in one’s personality, appearance, and behavior that are culturally defined as masculine or feminine.”47 The most commonly accepted and used construct in both measurement and analysis in research is gender identity, generally referring to how an individual perceives their own gender.10

The naming or categorization of gender identity remains inconsistent in the literature. Gender minority is an “umbrella” term that refers to transgender- and gender-nonconforming people, that is, people whose current gender identity or gender expression do not conform to social expectations based on their sex assigned at birth.47 Studies suggest that gender-typical as well as transgender children as young as age 3 years can reliably identify their gender.56 The Center for Disease Control and Prevention’s Youth Risk Behavior Surveillance System, the Center of Excellence for Transgender Health, and the US Department of Education’s School Climate Survey each conduct measures of gender identity among adolescents and youth. Each of these categorize gender identity similarly, using “man,” “woman,” “transgender man,” “transgender woman,” “gender nonconforming,” and “other.”10,50 However, given the rapid evolution of awareness, knowledge and exposure in society, recommendations on appropriate and acceptable terminology continue to evolve.57

There is relatively little published guidance related to the appropriate inclusion of gender identity in analytic models. The fundamental question to be asked is based on a specific research question, and which construct (sex, gender identify, or both) most appropriately measures and characterizes what the question aims to answer. Nowatzki and Grant43 argue that sex disaggregation alone is insufficient to understand gender-based contexts of health services, because it implies that differences in social, political, and economic power between individuals of different gender identities, and the health consequences of those inequalities, are not addressed. They concluded that regardless of the methodological approach taken, it is possible to do both a sex- and gender-based analysis, provided that appropriate indicators are incorporated into the data collection instruments.43 Questions remain regarding how both of these variables can be used in the same model, given the colinearity between the two constructs.46,51,55

In summary, although there is growing recognition of the need to separate constructs of sex and gender in epidemiologic inquiry and some recommendations for the importance of including sex differences in analytic models, there remain several open questions and inconsistencies in how to define and categorize sex as well as gender identity and how to appropriately incorporate both of these constructs in child health research.

Race and ethnicity

Race and ethnicity are now widely acknowledged as two rapidly evolving and poorly defined constructs; however, this was not always the case. The term “race” was first used to refer to genetically distinct groups within a species. However, our current-day uses of the term race do not reflect genetically distinct groups, but instead focus on taxonomies of human groups based on perceived physical characteristics and geographic origin.58 Race, as currently conceived, is a poorly defined marker for biologic and genetic variation found across all humans, as there is greater genetic variation within racial groups than across racial groups9,59,60,61. If interested in groups that are genetically similar, examining genetic ancestry is more appropriate, as it is based on populations that are geographically, culturally, and linguistically similar over time; however, groups by genetic ancestry are not equivalent to the socially and politically designated race groups.62,63

Ethnicity is used to classify human populations based on shared culture and way of life, especially as reflected in language, folklore, religious and other institutional forums, material objects such as clothing and food, and cultural products such as music, literature, and art.9 Although race and ethnicity have different meanings, the conceptual confusion between them within the research literature emerged in as early as 1978 when the US Office of Management and Budget created “race/ethnicity” as a combined category in the reporting of federal statistics.64 In the epidemiologic literature, the two terms are often used interchangeably, a combined expression of “race/ethnicity” is often included in analytic models, and the terms are rarely precisely defined or described by researchers.65,66,67,68

The epidemiologic literature also reflects the fluid and ill-defined categorization of race and ethnicity. Related to child health, natality statistics from the National Center for Health Statistics prior to 1989 reported the race of a newborn based on the race of both parents. However, when parents were of different races and one parent was white, the child was assigned the race of the non-white parent.69 These practices were rooted on the “one-drop rule” (Laws in the 1700s through the twentieth century, and held up by courts as late as 1985, which criminalized interracial marriages and designated White person as one “who has no trace whatsoever of any blood other than Caucasian” and took a “fractional, blood-borne approach” to define who was Black.) that reinforced white superiority and that being assigned to a white race was a privilege only for those of solely white generational lines. Since 1989, the race of the newborn is based on the race of the mother alone.70,71 In another example, Comstock et al.67 reported in their review of articles published in the American Journal of Epidemiology and the American Journal of Public Health from 1996 to 1999 that the number of categories for race and/or ethnicity in the literature ranged from 0 to 24, with an average of 3.14.67 In another extreme, Flores11 showed in a review of studies exploring racial/ethnic disparities in the health and health care of US children that combining all non-white children into one group occurred in 9% of the 122 studies excluded from their final analysis.11 Researchers often choose to combine or split certain categories, either based on the granularity of information available or to ensure adequate statistical power.11,67 Importantly, the majority of studies fail to explain or justify how race and/or ethnicity information was collected or combined, thus obscuring the process to readers.1,66,67,68,72

Studies have shown that race and/or ethnicity are often conceptualized as proxy measures for other concepts that are known or believed to be correlated with them (i.e., poverty, discrimination, cultural factors, structural racism, or unspecified biological differences).14,73,74 Walsh and Ross14 showed that in articles published in three general pediatric journals (Pediatrics, Journal of Pediatrics, and Archives of Pediatrics and Adolescent Medicine) between July 1999 through June 2000, 35% of the articles that reported race and/or ethnicity data did not report any socioeconomic information (40/115) and 24% that discussed race and/or ethnicity did not discuss socioeconomic factors (11/45), leading the authors to conclude that researchers are using race and/or ethnicity as an explanatory variable to represent poverty.14 Race is often included in clinical algorithms with no description of why racial differences in outcomes may exist, despite the inherently casual interpretation of such algorithms. If racism, socioeconomic differences, or other societal factors explain the differences in clinical outcomes, including race in such predictive models may actually increase disparities in health outcomes.75

In 1993, the Centers for Disease Control and Prevention recommended researchers to clearly indicate their reason(s) for analyzing data on race and ethnicity.76 Subsequently, in 2000 and 2001, the American Academy of Pediatrics’ Committee on Pediatric Research as well as the editors of the Archives of Pediatrics and Adolescent Medicine recommended that researchers not use race and/or ethnicity as explanatory variables in place of target underlying concepts (i.e., poverty, racism, etc.) that can and should be measured directly.12,77 More recently, the American Academy of Pediatrics has shifted to prioritizing the role of racism, the “system of structuring opportunity and assigning value based on the social interpretation of how one looks (which is what we call ‘race’),” rather than race itself, in investigations of trends in child health.78 Despite these recommendations, inserting race and/or ethnicity covariates continues and has, in fact, been found to be increasing in child health research.16,66,67

Relatively little has been published on appropriate analytic methods for including race and/or ethnicity in models when justified by an underlying conceptual framework.79 LaVeist suggested instead of merely “controlling” for race either to report models stratified by race groups or specify a multiplicative interaction term between the race variable and each of the other independent variables to explore more fully the effects of race in the analysis.35 Interpretations of these models move us toward understanding how our exposure or interventions might operate differently in one group than another, rather than erroneously attributing differences in treatment effects to race itself. Other notable guidance offered includes Jones’ recommendations for use of race in epidemiology, Kaufman and Cooper statements on valid approaches to using race in biomedical research, and VanderWeele’s approaches to causal interpretations of race.34,79,80 As decisions about how to capture race and ethnicity continue to evolve and allow for more complex self-identification, researchers will need to be more thoughtful about how best to categorize people for analysis.

In summary, despite representing two different social constructs, race and ethnicity are often combined in epidemiologic inquiry, and frequently included in analytic models either as poor proxies for other constructs or without any justification at all. Even when appropriately justified in the conceptual framework, further research is needed as to how best to include race or ethnicity in child health research.

Socioeconomic position

There are numerous terms to describe socioeconomic conditions, such as poverty, socioeconomic status, SEP, social class, social stratification, and social inequality. In general, these terms are used by researchers interchangeably, in spite of their different origins, theoretical bases and interpretations.81 For the purposes of this discussion, we will use the term SEP to refer to all of these sociologic concepts. SEP is distinguished from social class or socioeconomic status in that it encompasses both material- or resource-based and prestige-based measures of socioeconomic groupings.82,83 In epidemiologic studies on child health, commonly used SEP indicators include parental (mother and/or father) education and occupation, household income, wealth, poverty level, living conditions, neighborhood socioeconomic characteristics, and a variety of composite scales which consolidate multiple domains into a single construct.13,16,84,85,86 SEP is relatively frequently reported in the child health literature, and has increasingly been highlighted as an underlying determinant of a variety of child health outcomes.16,87

There has been much controversy on the dimensions that can best assess SEP; SEP is widely acknowledged to be a multidimensional construct comprising diverse socioeconomic factors, and that different indicators are often used to describe correlated but different aspects of SEP.8 For example, income and wealth most directly measure material circumstances, whereas education can reflect a range of noneconomic social characteristics, including general and health-related knowledge.88 However, over the past three decades, use of a single indicator to “control for SEP” has been commonly noted in the literature.89,90 For example, education is often used as a proxy for income, and income is often used as a proxy for wealth.13,90 Although SEP indicators have been widely assumed to be correlated, studies have indicated that these correlations are generally not strong enough to justify using one as proxy for all others.17,90,91,92 Braveman et al.90 analyzed five nationally or state-wide-representative data sources, and reported that the income–education correlation is mostly <0.5.90 Researchers have been recommending the use of more than one indicator to measure and represent SEP over the past several decades.13,91,93 Potential advantages of doing so specifically in child health research include both improving the accuracy of the measurement of the construct and allowing for a fuller understanding of the mechanistic pathways in the relationships between SEP and child health.94

Beyond the choice of indicators, the practical use of SEP in statistical analysis has additional challenges. First, although an individual’s SEP may change over time, most epidemiology research in child health relies on SEP ascertained at a single point in time.8 Second, children are dependent on their parents/caregivers. However, it is often unclear whose SEP characteristics and under what circumstances should be measured and assigned. For example, there is evidence that the influence of maternal and paternal education and income is actually different for certain outcomes.95,96 Third, how to quantify certain indicators is not clear, and certainly, geographic locale, calendar year, and individual demographics affect what level of difference SEP indicators most influence health outcomes.8,28,97

In summary, there is no question that SEP affects child health and well-being. Improving our understanding of how best to characterize and analyze this construct to optimize potential interventions to improve child health is critical.

Discussion

Our social, economic, and physical environments are well-recognized to influence child health, development, and well-being. Given the remarkable diversity of sex, gender identity, race, ethnicity, and SEP in children across the United States, it is incumbent upon pediatric and epidemiologic researchers to conduct their work in ways that promote inclusivity, understanding and ultimately reduction in inequities. In this paper, we underscore problems with the conceptualization, categorization, and analysis in current research in considering these core sociodemographic constructs. Current research often utilizes an approach of “convenience” in how data related to these constructs are collected, categorized, and included in models, and it is time for the field to be more systematic and thoughtful in its approach to understand how sociodemographics affect child health.

Publicly available data from large studies or consortia can be leveraged for their large sample sizes, and demographically and geographically diverse populations. Researchers have discussed the numerous benefits of promoting access to research data.98,99 Specific to child health, examples in the literature illustrate how accessing publicly available data can advance knowledge beyond what most smaller single cohorts could answer related to important outcomes such as obesity, mental health, and mortality.100,101,102 Entire datasets from large often nationally representative studies or surveys such as the National Survey of Children’s Health and the FLASHE study are available for public use.103,104 Data from a consortium of child cohorts called the Environmental Influences on Child Health Outcomes will have data available in the near future.105 What is missing from the literature is guidance on how the research community has an obligation to improve the discourse related to sociodemographic characteristics and disparities in ways that works to reduce inequities across all subpopulations.

Our paper has several limitations. First, we do not consider how to improve data collection or measurement of these constructs in child health research. While this article focuses on recommendations for users of data from repositories or publicly available sources, we do believe there is a need for future work discussing optimal approaches for defining, measuring, and collecting sociodemographic data in child health research. Second, there are several social characteristics that are not discussed in this paper, such as sexual orientation, immigration status, and so on. Third, in this paper, we do not consider ways to improve multilevel research, such as how best to characterize SEP when considering the influence of one’s neighborhood in their health. Although outside the scope of this discussion, we believe these are critical concepts that should be considered in the future.

We offer suggestions for how scholars can improve the discourse around sex, gender identity, race, ethnicity, and SEP in child health research. Improving the characterization and interpretation of child health studies with regards to core sociodemographic constructs is a critical component of optimizing child health and reducing inequities in the health and well-being of all children across the United States.