The World Health Organization (2002) defined health as functioning and well-being in physical, mental, and social health. Generic self-report measures are designed to be generally relevant and to be used to assess different subgroups of people (Patrick & Deyo, 1989). There is an emerging interest in the concept of whole-person health: “engaging with the whole person, not just the physical body but the emotional, mental and spiritual aspects as well is critical to healing” (Kligler, 2022). An overall whole-person score may provide a useful summary of perceived health. For example, Yin et al. (2016) found that a single factor accounted for the covariation among four items included in the United States Behavioral Risk Factor Surveillance System and recommended the use of a single summary score to monitor population trends. However, the reasonableness of an overall whole-person health measure needs to be evaluated in comprehensive patient-reported health measures (Chi et al., 2023).

Prior work has consistently found multiple correlated underlying dimensions for self-reported health. There is extension evidence in support of underlying physical and mental health (Essink-Bot et al., 1997; Hays et al., 2009; Hays & Stewart, 1990). For example, a confirmatory factor analysis of a 64-item measure administered to individuals with HIV supported a physical health factor defined by physical function, role function, freedom from pain, disability days, and quality of sex life, and a mental health dimension defined by the overall quality of life, emotional well-being, hopefulness, lack of loneliness, will to function, quality of family life, quality of friendships, and cognitive function/distress, with physical and mental health correlated 0.31 with one another (Hays et al., 1995). Correlations of 0.62 and 0.66, respectively, between physical and mental health factors were found for the RAND-36 (Farivar et al., 2007; Hays et al., 1998). Rasch-derived physical and mental health scores from the SF-36 correlated 0.74 (Chang et al., 2007). Finally, the correlation between Patient-Reported Measurement and Information System (PROMIS)-29 profile physical and mental health summary scores was 0.69 (Hays et al., 2018). In summary, prior work indicates that self-reported health is multidimensional with substantial correlations among the underlying dimensions.

We extend the work of Yin et al. (2016) by evaluating the extent to which an overall health summary score is supported in a comprehensive set of multi-item health measures. Generic health profile measures yield scores on multiple domains. In this study, we include widely used measures from the PROMIS project, a United States National Institutes of Health Roadmap initiative to create item banks for use in the general population and for different medical conditions that have been touted as state-of-the-science instruments (Cella et al., 2019; Kaplan & Hays, 2022). We supplement the PROMIS measures with the Personal Wellbeing Index, a subjective well-being measure endorsed by the World Health Organization and the Organization for Economic Cooperation and Development (Cummins et al., 2003). Generic health preference-based measures yield a single score anchored at 0 = dead and 1 = perfect health. We include the latest version of the most widely used health preference-based measure in the world (Brazier et al., 2017).

The inclusion of an important profile measure in the United States, a worldwide subjective well-being measure, and the most widely used preference-based measure makes it possible to thoroughly evaluate the plausibility of a summary indicator of whole-person health. Because low back pain is common and regarded as the leading cause of years lived with disability worldwide (Hoy et al., 2014; Wu et al., 2020), this study focuses on adults with low back pain.

Methods

We administered surveys in English to members of KnowledgePanel®, an online panel that relies on probability-based sampling methods for recruitment and provides a representative sample of non-institutionalized adults 18 and older residing in the United States (Ipsos, 2018). Data was collected at baseline and six months later for a subsample with back pain. The PROMIS-29 + 2 and the EQ-5D-5L were administered at both time points. Personal well-being and the PROMIS social isolation scale were only included in the 6-month survey. The analyses reported here are limited to the 6-month data to assess the dimensionality of these health measures.

At baseline, the survey vendor (Ipsos) sent an email invitation to 7224 KnowledgePanel members on September 22, 2022, and gave them 10 days to complete the survey. Email reminders were sent to non-responders on Day 3 of the field period. Additional reminders were sent to the remaining non-responders every 3 days for up to 10 days. Upon completion, respondents received an entry into the KnowledgePanel sweepstakes. Fifty-seven percent (n = 4117) completed the survey and 19 who reported having one or two of the fake health conditions included in the survey to identify careless or insincere respondents (Hays et al., 2023) were excluded, resulting in a baseline sample of 4098. The 6-month survey was offered only to those who reported back pain and did not endorse a fake health condition on the baseline survey. Seventy-nine percent of the eligible baseline respondents completed a 6-month survey (n = 1256).

Measures

PROMIS

The PROMIS ontology describes multiple domains within dimensions of physical, mental, and social health (Cella & Hays, 2022). Physical health is divided into physical symptoms and physical function, mental health is represented by affect, behaviors, and cognition, and social health includes social function and social relationships (e.g., family, and friends).

The PROMIS-29 (Cella et al., 2019) is the most widely used PROMIS profile measure. It assesses pain intensity using a single 0–10 numeric rating item, and seven health domains (physical function, fatigue, pain interference, depression, anxiety, ability to participate in social roles and activities, and sleep disturbance) using 4 polytomous (5 response categories) items per domain. Support for the reliability and validity of the PROMIS-29 has been shown in several prior studies (Cook et al., 2021; McMullen et al., 2022; Pecorelli et al., 2023; Peipert et al., 2018). In addition to the PROMIS-29, the study included the 2-item cognitive function scale from the PROMIS-29 + 2. The cognitive function scale had a reliability of 0.77 in a prior study (Hays et al., 2023). The PROMIS 4-item social isolation scale was also administered (Hahn et al., 2014).

All PROMIS measures are scored on a T-score metric with a mean of 50 and SD of 10 in the general U.S. population for all except sleep disturbance where the score is relative to a combination of the general population and clinical patients.

Personal Well-Being

To ensure adequate representation of subjective well-being (Cummins et al., 2004), we administered 10 items developed by the International Wellbeing Group (2013). Eight of the items used a 0 (No satisfaction at all) to 10 (Completely satisfied) response scale and asked “How satisfied are you with”:

  1. 1)

    your standard of living?

  2. 2)

    your health?

  3. 3)

    what you are achieving in life?

  4. 4)

    your personal relationships?

  5. 5)

    how safe you feel?

  6. 6)

    feeling part of your community?

  7. 7)

    your future security?

  8. 8)

    your spirituality or religion?

A ninth question asked, “Overall, how satisfied are you with your life as a whole these days?” (0 = Not satisfied at all, 10 = Completely satisfied). The final question asked, “Overall, to what extent do you feel the things you do in your life are worthwhile?” (0 = Not at all worthwhile, 10 = Completely worthwhile). Seven of the items are used in scoring the Personal Wellbeing Index (Cummins et al., 2003). Test-retest (intraclass correlation) reliability of 0.84 for a 1–2-week interval was found (Lau & Cummins, 2005). We created a 10-item average personal well-being score with a 0–10 possible score range.

EQ-5D-5L

The EQ-5D-5L items refer to “Your health today” and assess mobility, self-care, usual activities, pain/discomfort, and anxiety/depression with five response options (no problems, some problems, moderate problems, severe problems, and extreme problems), with 3125 possible health states (Herdman et al., 2011). The EQ-5D-5L U.S. weights we use were derived using Tobit modeling of time trade-off preference elicitation (Pickard et al., 2019). Extensive support for the reliability and validity of the EQ-5D-5L has been reported (Feng et al., 2021). The EQ-5D-3L was included along with the SF-36 to evaluate the dimensionality of self-reported health in a prior study (Essink-Bot et al., 1997). Because the EQ-5D-5L preference-based score combines information from physical, mental, and social health, we hypothesized that it would be a good indicator of overall health and be useful in assessing the reasonableness of the overall whole-person health dimension. Palimaru and Hays (2017) found that 69% of the variance in overall quality of life was accounted for by the EQ-5D-3L and PROMIS global health items.

Demographic Variables and Medical Conditions

We assessed gender, age, education, race/ethnicity, marital status, whether working full time and the presence of 20 medical conditions (see Table 1).

Table 1 Characteristics of the Sample (n = 1256)

Subjects

Table 1 shows that 52% of the back pain sample that completed the 6-month survey were female. The mean age was 55 (18–94 range). Thirty-five percent of the sample had a high school education or less. The majority were non-Hispanic White (74%), with 10% Hispanic, and 8% non-Hispanic Black. Most were married (61%). Thirty-six percent were working full-time. The most common medical conditions reported other than back pain were allergies (58%), hypertension and high cholesterol (47% each), and arthritis (46%).

Analysis Plan

We provide means, standard deviations, and internal consistency reliability (Cronbach, 1951) estimates for the PROMIS scales and personal well-being scale, means, standard deviations, and test-retest (stability) estimates over 6-months (baseline to 6-months later) for the pain intensity item and EQ-5D-5L. In addition, we report product-moment correlations among the measures. Then we estimate confirmatory factor analysis models.

We evaluated one-factor, two-factor, three-factor, and bifactor models (Reise et al., 2007; Rodriguez et al., 2016). For the one-factor model, loadings for all measures were estimated on the single factor. For the two-factor model, physical health was defined by physical function, pain interference, and pain intensity while mental health was defined by depression, anxiety, social isolation, personal well-being, cognitive function, and sleep disturbance. Fatigue, the ability to participate in social roles and activities, and the EQ-5D-5L were allowed to load on both physical and mental health. For the three-factor model, physical health was defined by the same domains as for the two-factor model. Mental health was defined by depression, anxiety, personal well-being, cognitive function, and sleep disturbance. Social health was indicated by the ability to participate in social roles and activities, and social isolation. Fatigue was allowed to load on both physical and mental health. The EQ-5D-5L was allowed to load on physical, mental, and social health. The bifactor model included all 12 measures loading on the general factor, 3 loadings on the physical health group factor (physical function, pain interference, pain intensity), and 4 loadings on the mental health group factor (depression, personal well-being, anxiety, social isolation). We assess model fit using the comparative fit index (CFI) and the root mean square error of approximation (RMSEA). CFI values of about 0.95 or higher and RMSEA values of about 0.06 or lower are indicators of a good practical fit of the model to the data (Hu & Bentler, 1999; MacCallum et al., 1996). In addition, we report the explained common variance for the general factor. We also assess the fit of the bifactor model in age, gender, education, and race/ethnicity subgroups. Finally, to evaluate the potential effects of including the EQ-5D-5L on the loadings for the other indicators in the factor analyses, we estimated the bifactor model excluding the EQ-5D-5L.

Analyses were conducted using SAS 9.4 (SAS Institute, Cary, NC).

Results

Descriptive statistics for the measures are shown in Table 2. The sample reported substantially more pain intensity, pain interference, and worse physical function than the United States general population. Internal consistency reliability coefficients ranged from 0.82 to 0.95 and 6-month test-retest correlates were 0.68 for the pain intensity item and 0.77 for the EQ-5D-5L. As seen in Table 3, the absolute value of product-moment correlations among the measures ranged from 0.25 (social isolation with physical function and pain intensity) to 0.83 (depressive symptoms and anxiety). The median of the absolute value of the correlations was 0.52.

Table 2 Mean, Standard Deviation, and Reliability of the Measures
Table 3 Product-Moment Correlations Among PROMIS-29 + 2, EQ-5D-5L, Personal Well-Being, and Social Isolation Measures

Because of the large sample size, all confirmatory factor analysis models were rejectable statistically at p < .0001. The one-factor model did not fit the data: χ2 (54 df) = 2794.19 (CFI = 0.74, RMSEA = 0.20). The two-factor model estimated a 0.50 correlation between physical and mental health. That model fit the data better than the one-factor model, but it still was below the thresholds of acceptable practical fit: χ2 (50 df) = 759.51 (CFI = 0.93, RMSEA = 0.11). The three-factor model fit the data less well than the two-factor model: χ2 (48 df) = 1059.96 (CFI = 0.90, RMSEA = 0.13). Moreover, the estimated correlation between mental and social health was 0.97. The bifactor model with a general health factor and physical health and mental health group factors provided an acceptable fit: χ2 (36 df) = 170.51 (CFI = 0.99, RMSEA = 0.05). Model fit was like that of the overall sample in all the age, gender, education, and race/ethnic subgroups (Table 4).

Table 4 Fit of Bifactor Model in Overall Sample and Age, Gender, Education, and Race/ethnicity Subgroups

Table 5 shows the 12-factor loadings on a general factor, 3 loadings on the physical health group factor, 4 loadings on the mental health group factor, and 10 correlated uniqueness estimates. Loadings on the general health factor were larger than on a group factor for 9 of the 12 measures, but physical function, pain interference, and pain intensity loaded slightly more on physical health rather than the general health factor. The general factor explained 78% of the common variance. The bifactor model that excluded the EQ-5D-5L fit the data equally well (CFI = 0.99, RMSEA = 0.05) and yielded standardized factor loadings that were virtually identical to the model with all 12 indicators (Table 5).

Table 5 Standardized Confirmatory Factor Loading Matrix for PROMIS-29 + 2, EQ-5D-5L, Personal Well-being, and Social Isolation Measures from the Bifactor Model (Estimates from Model Excluding EQ-5D-5L shown within parentheses)

Discussion

This study of adults with back pain shows that the nine measures from the PROMIS-29 + 2, the PROMIS social isolation scale, the personal well-being measure, and the EQ-5D-5L are substantially intercorrelated and, at a higher level, represent a single underlying dimension of health. The findings extend the Yin et al. (2016) research of a single factor for the four items to a more comprehensive collection of self-report measures of health. The results suggest that it is reasonable to combine PROMIS measures into a single score using factor-scoring coefficients as weights or a preference-based score (Dewitt et al., 2018),

Based on the two-factor model, one might have concluded, consistent with previous research, that there are two dimensions: physical and mental health. However, by specifying a bifactor model, we were able to evaluate the extent to which the 12 measures represented a general health concept. There was a strong indication of the presence of a general health factor as well as unique information shared among three physical health indicators (physical function, pain interference, and pain intensity) that a National Institutes of Health (NIH) Pain Consortium steering committee research task force proposed combining into an Impact Stratification Score for assessing adults with chronic low back pain (Deyo et al., 2014; Hays et al., 2021). The largest loading on the general health factor was observed for the PROMIS fatigue scale and 10 of the 12 indicators had standardized loadings (absolute value) of 0.62 or larger.

Consistent with most prior work, we failed to identify a separate social health factor despite including the PROMIS social isolation scale and the ability to participate in social roles and activities scale. One prior confirmatory factor analysis of PROMIS measures suggested three underlying factors: physical health represented by PROMIS measures of physical function, pain interference, pain behavior, and fatigue; mental health represented by anger, anxiety, depression, and fatigue; and social health represented by social role performance and social role satisfaction (Carle et al., 2015). However, the social health factor correlated more strongly with physical and mental health (0.67 and 0.68, respectively) than physical and mental health correlated with one another (r = 0.57), suggesting that social health was not a distinct dimension.

This study has limitations. The generalizability of the results is uncertain given the sample was limited to adult members of KnowledgePanel with back pain who responded to a baseline and follow-up survey. In addition, a wide range of measures was administered, but it is possible that stronger support for a social health factor would be obtained if additional measures had been included. Nonetheless, the study provides important information about the dimensionality of self-reported health.

Future research is needed to put this work in the context of emerging interest in whole-person health. For example, the U.S. Veterans Health Administration emphasizes the importance of whole-person health as part of comprehensive health care (Kligler, 2022). A deeper understanding of the dimensionality of health measures can contribute substantially to efforts to define and measure whole-person health. The reasonableness of single summary indexes such as the whole health index must also be evaluated further (Chi et al., 2023).