The factor structure and subscale properties of the pain catastrophizing scale: are there differences in the distinctions?

As presently constructed, the pain catastrophizing failed to discriminate effectively among levels of rumination, magnification, and helplessness in a sample of patients with chronic pain.


Introduction
In a 1983 study, Rosenstiel and Keefe identified 8 cognitive and behavioral strategies patients used to cope with back pain; one of which they named "pain catastrophizing." 31 Subsequently, Sullivan et al. 38 expanded measurement of pain catastrophizing with the development of the Pain Catastrophizing Scale (PCS), a longer and more comprehensive measure of the construct. 38 The PCS is widely used in international clinical care and research and has been translated into multiple languages. 37 The measure targeted 3 subdomains that predominated research in the field-rumination, magnification, and helplessness. The initial analyses of PCS responses supported a 3-factor structure based on exploratory principal-components analysis of PCS item responses by 429 undergraduate psychology students.
Although some studies have supported a 3-factor structure of the PCS, some have not. Osman et al. 24 attempted to replicate this factor structure in a sample of 288 undergraduate students including a subsample who were seeking care at a university health clinic. Their exploratory principal-components analysis supported a 2factor rather than a 3-factor solution. When they specified 3 factors, however, their findings were similar to Sullivan et al's. 38 In a follow-up study based on PCS responses of 215 community members and 60 pain outpatients, Osman et al. 23 conducted a higher order latent factor analyses with rumination, helplessness, and magnification as second order factors. Their results strongly supported pain catastrophizing as a single construct comprised of the 3 hypothesized subdomains, a finding that has been replicated for English 11 and non-English versions of the PCS 20,42 and in a recent study comprised of 675 Chinese and non-Chinese patients with knee osteoarthritis. 22 Further, in a study of the pediatric and parent Sponsorships or competing interests that may be relevant to content are disclosed at the end of this article.
versions of the PCS, Pielech et al. 25 found no support for the 3-factor structure and concluded the responses to the items were unidimensional for both versions. These results raise doubt regarding the PCS's success in measuring 3 distinguishable domains of pain catastrophizing.
With the exception of the study by Ong et al. 22 that used second-order factor analysis in a large clinical sample, previous studies either failed to use higher-order factor models or included few or no people with chronic pain (N 5 60 in 1 study 23 ; N 5 146 in another, N 5 0 in others). 11,24,38 The purpose of the current study was to apply bifactor analyses to PCS item responses from a large clinical chronic pain population to evaluate the dimensional structure of the PCS and identify appropriate scoring strategies. Bifactor modeling was chosen over second-order factor analysis because the former provides estimation of the incremental reliability of subscale scores beyond that provided by the overarching general factor.
We undertook this study because of our interest in measuring patient outcomes with the least amount of response burden. Increasingly patient-reported outcomes are measured using item banks that have been calibrated to an item response theory (IRT) model 6,7,35 and administered using computer adaptive testing (CAT). 33 In CAT administrations, administration of items from IRTcalibrated item banks is tailored to individual respondents. This approach increases the ratio of measurement precision to response burden. 8 However, the IRT models most commonly used for CAT require "essential unidimensionality"; that is, a large portion of the variance in item responses can be accounted for by a single, overarching domain.

Sample and design
Data are routinely collected from patients seeking care at the Stanford University Pain Management Center using the Collaborative Health Outcomes Information Registry (CHOIR), an open-source learning health care system platform. 36 As a part of this learning health care system, patients completed a battery of surveys from the Patient-Reported Outcome Measurement Information System (PROMIS) 6 and legacy measures including the PCS. 38 The patient population is a heterogeneous mix of people with pain of various etiologies including neuropathic pain, musculoskeletal pain, headache, fibromyalgia, and a small minority of cancer pain. Data are collected at point of care. This study is a retrospective, methodological study of a subsample of N 5 8369 CHOIR respondents.

Pain catastrophizing scale
The PCS is a 13-item, self-report measure of pain catastrophizing. 38 Respondents are asked to rate the frequency of painrelated thoughts and feelings they have in response to actual or anticipated pain using a 5-point scale that ranges from 0 5 not at all through 4 5 all the time. The PCS was developed to be scored both as a total score to represent pain catastrophizing and as subscale scores representing rumination, magnification, and helplessness. The PCS is a widely used measure that has been translated from English in to many other languages. 5,20,34,39,42 Scores on the PCS have been found to be responsive to changes in catastrophizing after intervention. 32

Demographics
Demographic variables are collected as part of the CHOIR. The deidentified data set for the current study, however, did not include these. We present demographics for a random sample of CHOIR data equal to the sample size of the study data (N 5 8350). These data were used to characterize the sample with respect to sex, race, education, and age.

Pain intensity
In CHOIR, patients report 7-day average pain intensity using a numerical rating scale (NRS) that ranges from 0 to 10. The NRS is commonly used to measure pain in chronic pain conditions and is highly correlated with other pain measures such as the visual analog scale. The mean of 50 is calibrated to either the mean of a US reference sample that matched the 2000 General Census sample with respect to age, sex, race/ethnicity, and education or to a clinically relevant population. 13,17 The PROMIS measures reported here were centered on the US general population representative sample. All PROMIS measures are scored such that higher scores indicate more of the symptom or function being measured.

Statistical analyses
Data were manipulated and descriptive analyses completed using IBM SPSS 25. 9 Using the sampling option in SPSS, approximately 50% of the sample was selected to serve as the modeling sample (Sample MODEL ). The unselected sample was used for cross-validation (Sample X-VALID ). Exploratory factor analyses (EFAs), confirmatory factor analyses (CFAs), and bifactor analyses were performed separately on all subscales. All other analyses were conducted on the combined sample. Interitem consistency of subscales and of the combined items of the PCS was measured using Cronbach's alpha.
We conducted, on each sample, 2 EFAs-one extracted 1 factor, the other, 3 factors. Two CFAs were conducted. One CFA posited that a single dimension explained the variance of responses to PCS items (CFA DIM51 ). The other posited that response variance could be explained by 3 factors, each represented by the subset of items intended to target helplessness, magnification, and rumination(CFA DIM 5 3 ). An alternate hypothesis was modeled using a bifactor analysis. 29 The bifactor model posited that variance in responses was explained by 4 factors-a general factor on which all items loaded and 3 orthogonal factors representing the unique variance accounted for by each of the 3 subscales. The bifactor and second-order factor models are similar but vary in the constraints imposed. Mansolf et al. 18 published a detailed description and analysis of the distinctions between models. 18 All factor models were completed using MPLUS version 8 and specifying a polychoric correlation to account for the categorical nature of the data. 21 Factor loadings were estimated based on maximum likelihood and weighted least square mean and variance adjusted estimators and were rotated using PROMAX rotation, which allows factors to be correlated with each other.
The CFA and bifactor models were compared based on traditional fit statistics-comparative fit index (CFI), Tucker-Lewis index (TLI), root mean square error of approximation (RMSEA), and standardized root mean square residual and commonly applied criteria-CFI and TLI .0.95, RMSEA values ,0.10, standardized root mean square residual values ,0.08. 3,4,14,16,19,27 For the bifactor model, we also calculated the omega-hierarchical (omega-H) statistic, which estimates the proportion of reliable variance associated with the general factor only, calculated as the ratio between general factor variance and total variance, including error. 28 As such, it is a reliability statistic for the general factor. A criterion of omega-H . 0.8 has been recommended as a threshold for a measure's essential unidimensionality. 30 Omega hierarchical subscale omega-H SS estimates the reliability of subscale scores after variance from the general factor has been extracted. We also calculated explained common variance (ECV), which is the proportion of the modeled variance that is accounted for by the general factor. 30 Explained common variance thresholds have been recommended for deciding on the value of subscale scores; ECV values below 0.70 suggest subscale scores have added value, whereas values above 0.90 suggest that they do not. 26

Sample characteristics
The total sample size was 8369; N 5 4179 comprised Sample MODEL and N 5 4190 comprised Sample X-VALID . Table 1 summarizes the demographic characteristics and scale scores for the entire sample and also separately for Sample MODEL and Sample X-VALID . The 2 samples were virtually equivalent regarding the evaluated variables, which was unsurprising given the sample sizes. The mean and standard deviation of the sample's ages were 49.1 and 16.1, respectively. Of those who reported sex (N 5 8207), 67.1% were women and 32.9% men. Of those reporting race (N 5 6453), the distribution was 61.3% White, 3.4% African American, 7.9% Asian, 1.1% Native American/ Pacific Islander, and 26.3% others. A total of 5749 reported level of educational attainment-6.4% no high school, 7.7% high school, 61.7% college, and 23.5% graduate school.
Summarized in Table 1 are scores for the NRS pain intensity scale and for PROMIS measures. Self-reported 7-day average pain intensity for the full sample was 5.6, a level interpreted as "moderate pain" when pain is categorized as mild, moderate, and severe. 15 However, 35.5% of patients had "severe pain" defined as pain of 7 or greater. On average, patients' symptoms and function were worse compared with the PROMIS general US population on which the scores were centered. 17 Patients in the sample reported depression and anxiety symptoms that were approximately a half standard deviation greater than those of the PROMIS reference sample and fatigue that was almost a full standard deviation higher. The biggest differences compared with the PROMIS reference sample, however, were for physical function and pain interference whose means in the full sample were 36.9 and 64.5, respectively.

Interitem consistency
Cronbach's alpha value for the total PCA scale was 0.944. For the helplessness, magnification, and rumination subscales, values were 0.901, 0.760, and 0.913 respectively.

Factor analyses
The EFA, CFA, and bifactor results were virtually identical when conducted on Sample MODEL and Sample X-VALID . For example, the greatest differences in factor loadings in EFA, CFA DIM51 , and CFA DIM53 , and bifactor loadings between samples was 0.062, 0.019, 0.016, 0.035, respectively. To conserve space, the results for Sample X-VALID are not shown but are available from the corresponding author. Table 2 reports the findings from the EFAs. The factor loadings for a single factor ranged from 0.673 ("I keep thinking of other painful events," magnification subscale) to 0.900 ("I keep thinking about how much it hurts," rumination subscale). The first and second eigenvalues were 8.610 and 0.761 (ratio of 11.3). The first factor accounted for 66.2% of the variance. These results support a single factor solution.

Exploratory factor analyses results
In a follow-up EFA, we extracted a 3-factor solution to explore the putative 3-factor structure of PCA responses. These results are also reported in Table 2. The loadings aligned with the hypothesized structure of the PCS except for 1 item, "there is nothing I can do to reduce the intensity of the pain"-an item from the helplessness subscale, which loaded below 0.40, and was more highly associated with the rumination items (0.377) than with the helplessness items (0.210). Intercorrelations among the 3 factors were high. Correlations with Factor 1 for Factors 2 and 3 were 0.717 and 0.742, respectively; the correlation between Factors 2 and 3 was 0.701. Table 3 compares the results from the 2 CFA models in which a single factor and 3 factors are hypothesized (CFA DIM51 and CFA DIM53 , respectively). Comparative fit index, TLI, and RMSEA values for CFA DIM51 model were 0.965, 0.957, and 0.136, respectively. For the CFA DIM53 , the values of CFI, TLI, and RMSEA were better (0.986, 0.983, and 0.084, respectively), but at the cost of parsimony.

Bifactor results
Finally, we fit a bifactor model in which all items loaded on a general factor and, in addition, loaded on a specific factor identified based on their subscale designations. Comparative fit index, TLI, and RMSEA values for the bifactor model were 0.993, 0.989, and 0.069, respectively. Table 4 displays the general and group factor loadings for the bifactor model. For ease of comparison, the CFA DIM51 factor loadings are replicated in the table. The most salient finding is the degree of similarity between the factor loadings in the unidimensional CFA model and those for the general factor of the bifactor model (largest difference 5 0.076). Larger variations would be expected when data have substantial multidimensionality. Also of note is the fact that the HELP subscale item, "there is nothing I can do to reduce the intensity of the pain" had a low negative loading on the HELP factor after the variance of the general factor was extracted; that is, the item accounted for no additional reliability once the general factor variance was accounted for. Bifactor statistics also confirmed the essential unidimensionality of the data. Omega-H was 0.97 for both Sample MODEL and Sample VALID , well above the recommended criterion of .0.8 for confirming a measure's essential unidimensionality. 30 Once the general factor was accounted for, the subscales accounted only for small amounts of variance. Omegasubscale values for the helplessness, magnification, and rumination subscales were 0.06, 0.00, and 0.014, respectively in the Sample MODEL and 0.08, 0.00, and 0.17 for Sample VALID . Explained common variance of the general factor was 0.96 for both samples, well above the recommended criterion of 0.90, and suggesting that subscale scores do not add value.

Discussion and conclusions
The results of this study provide substantial evidence for the essential unidimensionality of the PCS total scores. The EFA results did not warrant extracting more than one factor. When 3 factors were forced, the results aligned with the putative subscale structure with the notable exception of 1 item from the helplessness subscale that loaded 0.37 on rumination and only 0.210 with its designated subscale. The fit of a CFA model that posited a single factor was acceptable. A 3 dimensional model had better fit, but at the cost of parsimony.
The bifactor results were the most telling because they quantified the reliability of the subscale scores after a general factor was extracted. Whereas the reliability (omega-H) for the general factor was above 0.96, the reliability of the subscales was 0.14 and 0.17 at its highest (rumination subscale) and was 0.00 for magnification subscale scores.
A limitation of this study is the demographic homogeneity of the sample. Future studies could attempt to replicate the findings with a more racially and ethnically diverse sample. However, a number of factors support confidence in the findings. First, the sample was large and comprised of patients with heterogeneous chronic pain problems seeking care from a large pain management center, and thus, the variable of interest-pain catastrophizing-was relevant to respondents. Second, the consistency in results across the model and validity samples indicates that the results are likely to generalize to other populations of people with chronic pain. Finally, the use of bifactor analyses allowed quantification of what subscale scores add beyond the general factor scores.
Two published studies had findings that seem to conflict with our results. Craner et al. 10 applied hierarchical multiple regression analyses to evaluate the contributions of PCS subscale scores in accounting for pain and quality of life Table 2 Exploratory factor analyses item loadings for helplessness (HELP), magnification (MAG), and rumination (RUM) questions (Q); based on modeling sample; one and three factors extracted. variables. When the subscales were entered as individual predictors, the helplessness and magnification scores, but not rumination scores, accounted for unique variance in several tested variables. In another study, Gilliam et al. tested the mediating impact of PCS subscale scores in treatment outcomes. 12 Improvement in helplessness scores proved to be the most consistent mediator in treatment outcomes. Magnification subscale scores had the least mediating effect. The findings of both of these studies highlighted potential weakness of the magnification subscale in comparison with the other subscales. In our study, the magnification subscale had the lowest interitem consistency of the subscales (alpha 5 0.760 in the full sample) and the lowest subscale reliability after the general factor was extracted (0.00 in both model and validation samples). Future research should evaluate whether these findings are replicable. Joint efforts between theory development and measurement science could further distinguish the clinical relevance of the PCS subscales in characterizing the impact of pain catastrophizing in individuals' pain experiences. Our findings did not support added Table 3 One-and 2-dimensional confirmatory factor analysis (CFA) results (based on the modeling sample).  value in subscale scores. From a measurement perspective, there seems to be little difference in what is being measured by the subscale items and what is being measured by the full scale. However, It is possible that these subdomains are clinically meaningful, but the PCS (at least as presently constructed) fails to reliably discriminate these distinctions. The clinically relevant question is whether the distinctions fail only at the measurement level or do they also fail at the theoretical level. One possibility is that the differences found in previous studies, although attributed to different pain catastrophizing domains, 10,12 actually are reflective of differences in the ability of individual items to discriminate among respondents. If one or more PCS subscale is composed of items with better psychometric properties (eg, items with higher discrimination), then it would not be surprising to find the subscale comprised of the better items would be more responsive to change and more highly correlated with clinical anchors. This possibility could be evaluated by modeling the PCS item responses using an IRT model that estimates both item difficulty (intensity of the item) and item discrimination. Our findings have implications for how the PCS is administered. If PCS item responses meet the other assumptions of a unidimensional IRT model, the PCS could comprise a calibrated item bank, and the items could be administered using CAT. In settings in which CAT is not feasible, a short form version of the PCS could be constructed based on IRT modeling. Reducing the burden of measuring pain catastrophizing could increase its use in clinical settings, providing clinicians with greater understanding of the psychosocial context of their patients' pain. However, a study that calibrated PCS responses using a one parameter IRT model (Rasch model), found, for 2 items, disordered category responses; for example, a response of "3" was associated with higher levels of catastrophizing than a response of "2." 41 Recently, a new measure of pain catastrophizing was developed by Amtmann et al., 1 the Concerns About Pain (CAP) scale. This measure was developed using IRT analysis and can be administered using CAT or one of 3 short forms (2items, 6-items, and 8-items). Development of the scale included extensive qualitative analyses, 2 the full bank has a reading level of 3.4, and a crosswalk is available to associate CAP scores with PCS scores. With these advances, researchers can consider both the CAP and the PCS for measuring pain catastrophizing.

Disclosures
The authors have no conflicts of interest to declare.