Assessing Differences in How the CushingQoL Is Interpreted Across Countries: Comparing Patients From the U.S. and the Netherlands

Background: Cultural factors influence how individuals define, evaluate, and approach their quality of life (QoL). The CushingQoL is a widely used disease-specific questionnaire to assess QoL in patients with Cushing's syndrome. However, there is no information about potential cross-country differences in the way patients interpret the items on the CushingQoL. Thus, the current study examined if the CushingQoL is interpreted in the same way across nationalities. Methods: Patients from the U.S. (n = 260) and the Netherlands (n = 103) were asked to fill out the CushingQoL and a short demographics survey. Measurement invariance testing was utilized to explore whether or not the patient samples from the U.S. and the Netherlands interpreted items on the CushingQoL in the same way. Results: A two-subscale scoring approach was used for the CushingQoL. Model fit was good for the U.S. sample (e.g., CFI = 0.983; TLI = 0.979), as well as the Dutch sample (e.g., CFI = 0.971; TLI = 0.964). Invariance testing revealed that three of the 12 items on the CushingQoL were interpreted differently across the groups. These items are all related to psychosocial issues (e.g., irritable mood and worrying about one's health). Items assessing physical aspects of QoL did not vary across the U.S. and Dutch samples. Conclusions: Interpreting results from the CushingQoL requires careful consideration of country of residence, as this appears to impact the interpretation of the questionnaire.


INTRODUCTION
Cushing's syndrome (CS) is a rare disease that is characterized by chronic overexposure to elevated cortisol levels (1). CS can have various causes, including: adrenocorticotropic hormone (ACTH) releasing pituitary adenomas, long-term high-dose glucocorticoid steroid use, adrenal tumors, or ectopic tumors (1). Patients with CS experience a number of physical symptoms [e.g., pain, easy bruising, and trouble sleeping (2)] and psychological symptoms (e.g., cognitive impairments, irritability, and impaired quality of life [QoL, (3)(4)(5)]. Upon remission, most patients report a reduction in the signs and symptoms of CS, although QoL typically continues to be impaired (5,6). It is important to accurately assess QoL in patients with CS. The Cushing QoL questionnaire (CushingQoL) was designed specifically for patients with Cushing's syndrome (7).
The questionnaire contains 12 items, each with 5 item response categories (depending on the item, the choices are either "Always" to "Never, " or "Very Much" to "Not at All"). The items inquire about different aspects related to QoL that the patient may have experienced in the past 4 weeks. The CushingQoL can either be scored using a total score ranging from 0 to 100 with lower scores indicating a greater impact on QoL [see Webb et al. (7)], or using two different subscales (8). These subscales relate to physical problems and psychosocial issues. Questions in the CushingQoL related to the physical dimension include those such as "I bruise easily, " and questions related to the psychosocial dimension include items such as "I have had to give up my social or leisure activities due to my illness." Table 1 contains all items in the two subscales, and more information about the structure of this scoring option can be found in Tiemensma et al. (8). For more information on the validity of this questionnaire, see Webb et al. (7). The CushingQoL was developed as a multi-language scale, with papers published using the English version [see e.g., (9)] and Dutch version [see e.g., (10)], among others. However, there are no studies looking into potential cross-country differences in the way patients interpret and answer the items on the CushingQoL. Validating a survey across cultures is an important component of assessing the overall scale performance and interpretation of individual items. Cross-cultural validation studies for surveys have been conducted in a variety of contexts, which include depression (11), personality (12), and the psychological impact of exercise (13), to name a few.
The U.S. Surgeon General has emphasized the importance of measuring the same construct across nationalities (14). Cultural factors influence how individuals define, evaluate, and approach their health problems. An observed mean difference between groups may be due to true differences between groups on a health outcome. However, it is also possible that these discrepancies are the result of a difference in interpretation of individual items across nationalities. There is a statistical process that can be used to assess whether or not different groups are interpreting items in the same way, and this process is referred to as measurement invariance testing. If a questionnaire is measurement invariant (MI) across groups, then this result indicates that the items on a questionnaire are interpreted the same across the groups. In other words, the items are tapping into the same underlying construct (e.g., QoL) across groups.
Following the guidelines of the U.S. Surgeon General (14), the aim of the current study is to examine if the CushingQoL is being interpreted in the same way across nationalities, or if there are elements of the items that vary in interpretation.

Subjects
Participants were recruited from the United States (U.S.) and the Netherlands. The study protocol was approved by the University of California, Merced Institutional Review Board. All patients provided digital informed consent in accordance with the Declaration of Helsinki before filling out the survey.
For the U.S. sample, participants were 260 patients who were invited to participate through the Cushing's Support and Research Foundation's (CSRF) listserv and Facebook page. Patients were eligible for the current study if they were over 18 years of age, in remission from CS, and living in the U.S. at the time of completing the survey. Patients were asked to complete the CushingQoL (English version) and a demographics survey. The majority of participants were female (91.1%, n = 239). Participants were on average 49.6 years old (SD = 13.02).
For the Dutch sample, participants were 103 patients recruited through the Dutch Adrenal Association (NVACP). NVACP members received an email through the NVACP listserv, with a short description of the study and a link to the online survey. Patients were eligible for the current study if they were over 18 years of age, in remission from CS, and living in the Netherlands at the time of completing the survey. Patients were asked to complete the CushingQoL (Dutch version) and a demographics survey. The majority of participants were female (87.4%, n = 90). Participants were on average 53.17 years old (SD = 12.57).

Procedure
Patients in both samples received a digital consent form first, which described the nature of the study. After reading and signing the consent form, they were directed to the online survey which included the 12 CushingQoL items as displayed in the original paper-and-pencil version published by Webb et al. (7). The survey was typed in larger font as to mimic the paper-and-pencil version, and all of the items were kept on the same page (as opposed to presenting one item at a time) to exactly mimic the paper-and-pencil version. Upon completion of the survey, patients clicked to the next page, where they received a demographics survey.

Translation Process for Converting the CushingQoL to Web-Based
As mentioned, the CushingQoL was given to patients online as opposed to the original paper-and-pencil version. The translation process for converting the paper-and-pencil version to an online version largely adhered to the guidelines presented by the ISPOR ePRO good research practices task force report (15). In these guidelines, the task force recommends preforming a cognitive debriefing to ensure accurate translation from the original questionnaire format. In the current study, a patient was recruited to participate in a cognitive interview, which was used to assess the validity of the online version of the CushingQoL.

Statistical Analyses
All analyses were conducted in Mplus version 7.4 (16), and the CushingQoL items were treated as categorical variables given that there were 5 item response choices for each item 1 . Collinearity was evaluated through item correlations on the full sample of valid (i.e., non-missing) responses. No problematic levels of collinearity (i.e., item correlations) were detected. Two questions reflecting the effect of CS on daily activities (items 9 and 10; see Table 1) were highly correlated (r = 0.783 for the full sample), but this level of correlation was not severe enough to cause problems in the invariance phase. Therefore, all models were estimated using all CushingQoL items.
To test whether a questionnaire is MI, an analysis called multiple-group confirmatory factor analysis (MGCFA) can be used. MGCFA assesses whether or not items are related to constructs in the same way across the groups. An iterative modeling approach is implemented, where several phases of the model are estimated (17). The first phase examines the situation where the groups are allowed to have entirely different item-level interpretations (i.e., the groups can interpret the construct in entirely different ways, and this drives different response patterns for the items across groups). The subsequent phases of the process iteratively add restrictions in the MGCFA that force certain elements of the model to be the same for both groups. By forcing elements to be the same for both groups, model fit can be examined at each phase of the MI process to determine exactly where the groups are the same (or different) in their interpretation of the items.
The initial estimation of the factor structure followed Tiemensma et al. (8), where two subscales in the CushingQoL were specified. Nine items loaded on a subscale representing the presence of psychosocial issues. The other three items loaded onto a second subscale representing the presence of physical problems (see Table 1).
In the next phase, measurement invariance was evaluated by estimating several multiple-group CFA models. This process was used to compare the U.S. and Dutch samples and assess whether the same subscale structure and model results exist across the groups. Following the guidelines prescribed by Meredith (17), three models were estimated and compared under different levels of invariance (i.e., making the subscale structure in the CFA model increasingly similar for the two groups): (Phase 1) configural invariance, where the subscales are measured by the same items across the groups, but nothing is constrained across groups in the model at this point 2 ; (Phase 2) metric invariance, where strength of the relationship between the items and the subscales (i.e., factor loadings) are constrained to be equal across groups 3 ; and (Phase 3) scalar invariance, where factor loadings and estimates (i.e., item response thresholds) that are linked to the individual item response categories (e.g., "Always" to "Never") are held equal across the U.S. and Dutch groups 4 . There is an optional fourth phase of invariance testing. Specifically, if any of the model comparisons indicated a significant worsening of model fit, the option of a partially invariant model would be explored (18). In this case, some (but not all) of the factor loadings or item response thresholds are allowed to vary between the U.S. and Dutch samples. This would indicate that certain questions are interpreted and answered in a different way by U.S. and Dutch participants.
Several model fit indices are reported in the analysis section. The χ 2 difference test [p < 0.05 (19)] was used to compare models at each step of invariance testing. The comparative fit index [CFI; for use in invariance testing, see (20)] and Tucker Lewis fit index [TLI (21)] are fit measures where values closer to 1.0 are considered to represented good fit (as opposed to fit values closer to zero). The root mean square error of approximation [RMSEA (22)] was also used to examine the absolute fit for each model, and values closer to zero represent better fit (compared to values closer to 1). These measures were included because the χ 2 test is known to be sensitive to a variety of assumption violations, as well as larger sample sizes (23,24). Thus, it is recommended to use several different methods for assessing fit and examine whether or not they coincide.

Invariance Testing (Tables 3, 4)
First, the two-scale factor structure was examined for each group separately. The model fit indices all suggested that the twoscale factor solution reflected the response patterns in the data well for both groups.  Table 3. Phase 1, the configural model, fit the data well. This implies that the way in which the items are related to the two subscales is the same across both groups. However, the item-level interpretations are still allowed to be completely different in this phase.
The second phase, testing metric invariance, did not result in a significant decrease in model fit, as shown by the nonsignificant χ 2 difference test. This implies that the strength of the relationship between the items and the subscales is the same across the two groups.
The third phase, testing scalar invariance, did result in a significant decrease in model fit, as shown by the significant χ 2 difference test. This implies that U.S. and Dutch participants with the same subscale score do not have exactly the same underlying answer pattern on the individual subscale items. Consequently, several partially invariant models (phase 4) were explored.
The fourth phase focused on identifying thresholds (i.e., item response categories) that caused the biggest decrease in model fit. Factor loadings were not considered in these partial invariant models, as phase 2 indicated that they could be held equal across the two groups.
Four partial models were estimated. The first three models examined releasing additional thresholds based on changes in the χ 2 value, starting with those for item 10, followed by those for item 12, and item 5 (see Table 1) 5 . These three items were 5 Modification indices were used to identify thresholds that would lead to the biggest improvement in model fit if they were allowed to differ between the groups. Modification indices reflect the change in χ 2 that would occur if a certain threshold would be allowed to vary between the two groups. Higher Modification indices indicate a larger decrease in χ 2 and thus a larger improvement of model fit. all part of the Psychosocial Issues subscale of the CushingQoL. For each of these models, the χ 2 difference test indicated that the model still fit the data worse than the metric model from phase 2. For the fourth partial model, the added restriction (on a residual variance) did not impact substantive interpretation so the analysis process was concluded. Thus, the model that best reflected both substantive knowledge and statistical fit to the data only allowed the thresholds of item 5, 10, and 12 to vary across groups. Table 4 illustrates that these three items had different response patterns across the two groups. It is important to note that none of the items on the Physical Problems subscale were allowed to vary, which indicates that for this subscale, participants from both countries did interpret and respond to the items in the same way.
To assess the impact of the noninvariance on the overall composition of the subscales, we compared two versions of the model to see if the U.S. and Dutch participants would differ. The first version assumed all item responses were interpreted the same across groups (i.e., the item thresholds were the same, also called scalar MI-phase 3), and the second version allowed the item responses for items 5, 10, and 12 (all from the Psychosocial Issues subscale) to vary across the U.S. and Dutch participants (also called partial MI-phase 4) (25). The comparison was specifically to assess whether the subscale means were comparable across the two groups under these modeling conditions. The U.S. and Dutch participants did not differ in their mean subscale scores for the first version (i.e., the scalar model-phase 3) of the model (B = 0.014, SE = 0.12, p = 0.902), or for the second version assessed here (i.e., the partial modelphase 4) of the model (B = 0.068, SE = 0.12, p = 0.561). The MI results indicated that the CushingQoL's Psychosocial Issues subscale does not measure exactly the same construct in U.S. and Dutch samples. However, this final assessment uncovered that the analysis of subscale mean differences across groups led to the same conclusion (i.e., there was no significant mean difference) even if the model (wrongly) assumed that items  were interpreted exactly the same across groups (i.e., the scalar model-phase 3). This indicates that the impact of the difference in item interpretation between these groups might be limited in scope.

DISCUSSION
The aim of the current study was to examine if CushingQoL is being interpreted in the same way across nationalities, or if there are elements of the items that vary in interpretation. Following the U.S. Surgeon General recommendations, it is important to assess whether the same construct (i.e., QoL) is measured across nationalities (14). Measurement invariance testing was utilized to explore whether or not the patient samples from the U.S. and the Netherlands are interpreting items on the CushingQoL in the same way. The current study found that the U.S. and Dutch groups differed in their interpretation of some of the CushingQoL items. More specifically, they differed in their endorsement of response categories for three items. The Dutch sample was more likely to endorse two of these items (item 5 and 10). For both of these items, Dutch participants were more likely to indicate that they experienced lingering effects of their illness on their daily lives. The U.S. sample was more likely to endorse item 12; they were more likely to express worry about their future health.
These results should be interpreted as exploratory in nature. The procedure for assessing partial MI is data-driven, and as such, the results found here might not replicate in new samples. However, we assessed whether there were actual differences in response pattern between the U.S. and Dutch samples (see Table 4) before allowing certain items to vary between the two groups. This limits the possibility that the findings were due to chance. Cross-validation of the findings found in the current study would provide additional support.
Some of the differences observed could be due to dissimilar sociodemographic and clinical characteristics between the U.S. and Dutch sample. Specifically, the Dutch sample was significantly older, less educated, and reported a longer duration of remission compared to the U.S. sample. In addition, Dutch patients reported a higher incidence of both adrenalectomy and postoperative radiotherapy, and reported a higher usage of Hydrocortisone and Fludrocortisone than the U.S. sample, which is likely due to the differences observed in treatment modalities. These sociodemographic and clinical characteristics could potentially lead to a different interpretation of the items related to lingering effects of Cushing's syndrome on their daily lives. As a result of the self-report nature of this exploration, it is possible that information related to diagnosis, remission status, or treatment (e.g., hormonal supplementations) could be inaccurate if the patient unknowingly misunderstood the information provided by their doctor. In addition, the degree of supplementation of their hormones is unknown.
There are several clinical implications related to these findings. First, researchers conducting a multi-nation study should be mindful to ensure that analyses examining the CushingQoL are separated by nation rather than combining patients across multiple nations. Given the results of the current study, it cannot be assumed that the CushingQoL is being interpreted in the same manner across nations. Thus, it is imperative to separate results by nation when discussing QoL and other research implications. In addition, when clinicians compare CushingQoL scores for individual patients to the literature for score interpretation, it is important that the comparison is made to a body of literature reflecting the country of residence of the patient being examined. Finally, researchers interested in designing interventions surrounding QoL should create the intervention based on individuals residing in the same country where the intervention will be implemented to ensure comparable interpretation of the facets of QoL.
Future studies can examine other nations where the CushingQoL is commonly implemented [e.g., Spain, France, Germany, and Italy (7,26)] to investigate whether differences in interpretation emerge across groups. It may be that differences will be less substantively impactful between countries close in proximity, with more similar health care systems, and with more overlapping cultures overall. However, a full assessment using the MI process would uncover the exact similarities (and differences) across these nations commonly implementing the CushingQoL. We note that the extent to which we can comment on broader issues related to cross-cultural interpretations of QoL are limited to this patient population. Further research would be needed to assess whether QoL interpretations differ across cultures in a more comprehensive context that spans beyond this patient population.
In summary, the CushingQoL is a valuable tool for assessing the QoL of patients with CS. Interpreting results from this tool requires a careful consideration of country of residence, as this appears to impact the interpretation of the questionnaire.

AUTHOR CONTRIBUTIONS
All authors wrote and edited the manuscript and approved the final draft. In addition, SW performed the data analysis and interpreted the results, JT conceived the study and collected the data, and SD assisted with data analysis and interpretation.