Latent structure and measurement invariance of the Hospital Anxiety and Depression Scale in cancer outpatients

Background/Objective The aim of the present study was to compare competing psychometric models and analyze measurement invariance of the Hospital Anxiety and Depression Scale (HADS) in cancer outpatients. Method The sample included 3,260 cancer outpatients. Latent structure of the HADS was analyzed using confirmatory factor analysis (CFA) with robust maximum likelihood estimation (MLR). Measurement invariance was tested for age, time of response, gender, and cancer type by comparing nested multigroup CFA models with parameter restrictions. Results Except for the one-factor solutions, all models showed acceptable model fit and measurement invariance. The model with the best fit was the originally proposed two-factor model with exclusion of two items. The one-factor solutions showed inacceptable model fit and were not invariant for age and gender. Conclusions The HADS has a robust two-factor structure in cancer outpatients. We recommend excluding item 7 and 10 when screening for anxiety and depression.


Introduction
Depression and anxiety disorders are the most frequently diagnosed psychiatric complications in patients with cancer, with prevalence rates being two to four times higher than in people without a cancer diagnosis (Pilevarzadeh et al., 2019;Unseld et al., 2019). Psychiatric comorbidities greatly affect patient quality of life and treatment adherence, and negatively impact physical health outcomes and mortality Unseld et al., 2021). Recognition and treatment of psychiatric disorders is crucial in multidisciplinary care (Pitman et al., 2018;Tsaras et al., 2018). Routine screening instruments are particularly valuable for early detection. In our study, we analyzed one of the most frequently used screening tools, the Hospital Anxiety and Depression Scale (HADS, Zigmond & Snaith, 1983).
The HADS was designed for routine screening in outpatient hospital settings and contains 14 items, seven each for anxiety and depression, rated on a four-point Likert scale. Validity studies in cancer patients showed favorable results in many languages (e.g. Annunziata et al., 2020;Hyland et al., 2019;Wondie et al., 2020). Due to its brevity and the availability in multiple languages, the HADS is frequently used in routine screening and international multicenter studies. However, analysis of the latent structure of the HADS has yielded some controversial results (Cosco et al., 2012), which also calls into question the scoring procedure to reliably screen for anxiety and depression.
Initially, the HADS was designed as a two-factor scale. Although not proposed by the authors of the HADS, the total score including all 14 items is frequently used as a measure of psychological distress. Some studies have also proposed and tested a variety of three-factor models with correlated or uncorrelated factors and a hierarchical or non-hierarchical structure (Caci et al., 2003;Dunbar et al., 2000;Friedman et al., 2001) following the tripartite theory, where anxiety and depression are characterized by both shared and unique features (Clark & Watson, 1991). The present study examines twelve models of the HADS as depicted in Table 1. A systematic review and meta-CFA described the latent structure of the HADS as unstable due to variation in study results (Cosco et al., 2012;Norton et al., 2013). However, we hypothesize that the variability in results is influenced mainly by three aspects: (1) characteristics of the sample, (2) single problematic items, and (3) the statistical methods used for determining factor structure.
First, examining the characteristics of samples in individual studies, some samples seem to yield a reasonable stable factor structure of the HADS. In patients with various types of cancer, a two-factor structure was supported by the majority of studies in different languages (e.g. Hyland et al., 2019;Moorey et al., 1991;Muszbek et al., 2006). In contrast, in patients with heart disease, most studies supported a three-factor solution (e.g. Emons et al., 2012;Martin et al., 2008). Studies on outpatients (i.e., the original target population of the HADS), including cancer outpatients, also supported a two-factor structure of the HADS (e. g. Moorey et al., 1991;Muszbek et al., 2006;Wiriyakijja et al., 2020).
Second, across studies, some items have shown to be difficult to allocate to one of the two factors, depression or anxiety. Considering that the HADS contains 14 items only, having one or two items that do not measure the same latent construct can have a considerable effect on the stability of the factor structure. Item 7 of the initial anxiety factor ('I can sit at ease and feel relaxed') often showed similar loadings on both factors, anxiety and depression, or was part of a third factor related to 'restlessness' (Moorey et al., 1991;Muszbek et al., 2006;Nezlek et al., 2021). It was argued that this specific item can overlap with problems caused by a somatic illness, e.g. in persons with spinal cord injury (Woolrich et al., 2006) or coronary heart disease (Martin et al., 2008). Another problematic item found in previous research was item 10 ('I have lost interest in my appearance') of the depression factor (Caci et al., 2003;Emons et al., 2012;Smith et al., 2006).
The third aspect contributing to an unstable factor structure is the statistical methods used. HADS data are ordinally scaled and usually positively skewed. Therefore, the Maximum Likelihood (ML) method as the default estimator for CFA should not by applied. Some studies analyzing the latent structure of the HADS did account for ordinality and/or skewness of data. However, the majority did not, thus applying methods not suitable for the data quality of the HADS, leading to potentially controversial results.
Measurement invariance is an indispensable prerequisite for a scale used in heterogeneous populations like cancer patients. Measurement invariance assumes that the same latent dimensions are measured, and that items function the same way in different groups, e.g. according to gender or age. If a scale is invariant its results can reliably be compared between groups. There are different levels of measurement invariance with increasing restrictions in parameters: (1) Configurational invariance assumes that the same factor structure holds in all groups, (2) metric (weak) invariance assumes that factor loadings are identical between groups, i.e., in every group each item contributes to the construct in the same way, (3) scalar (strong) invariance assumes that loadings and intercepts are identical. This is the necessary prerequisite for an instrument to compare mean scores across groups. Measurement invariance of the HADS has hardly been tested in cancer patients. A comparison of German an Ethiopian patients found only metric invariance (Wondie et al., 2020).
Our study aims at determining the latent structure of the HADS in a large sample of cancer outpatients using analysis methods suitable for the data. We include theoretically and empirically derived models, and test for invariance according to age, gender, cancer type and time of response (cohort effect), which, to the best of our knowledge, has never been performed in a sample of cancer patients. Based on this analysis, we propose an optimal factor structure and scoring procedure for the HADS to more reliably screen for anxiety and depression in cancer outpatients.

Method Participants
The final sample for statistical analysis included 3,260 cancer outpatients (50.7% women). Age ranged from 18 to 92 years with a mean age of 58.41 (SD = 14.58). Cancer diagnosis was available for a subsample of n = 2,562 (78.6%) participants. The most frequently diagnosed solid tumor was

Instruments
Questionnaires included the HADS (Zigmond & Snaith, 1983) and a sociodemographic profile. The HADS is a 14-item screening instrument for anxiety (7 items) and depression (7 items). All items are rated on a 4-point Likert scale. Item numbers and wording are depicted in Table 3.

Procedure
The present study was embedded in a larger ongoing research project performed at the outpatient clinic [research site blinded for review], aiming to assess psychosocial aspects in cancer patients. It was a single-center study. Data used in this study were collected from 2013 to 2021. In this time period, the HADS was used to screen for anxiety and depression at the outpatient clinic. Patients treated at the clinic were invited to participate upon the following inclusion criteria: (1) confirmed diagnosis of cancer, (2) age 18, (3) capacity to consent, and (4)

Statistical methods
For analyzing the latent structure of the HADS, we used confirmatory factor analysis (CFA), which is usually estimated via a maximum likelihood (ML) approach. However, this estimator assumes normally distributed continuous data and is less suited for ordinal and potentially skewed data. Two possible alternatives are the robust ML estimation (MLR) and the weighted least squares mean and variance adjusted estimator (WLSMV; Muth en, 1984). MLR can be used with skewed distributions but assumes continuous data. WLSMV can be used with ordinal data but assumes normal distribution of the underlying latent dimension. We decided to use MLR, a maximum likelihood estimation with robust (Huber-White) standard errors and a scaled test statistic that is (asymptotically) equal to the Yuan-Bentler test statistic (Yuan & Bentler, 2000). This decision was based on the following reasons: (1) the latent dimensions anxiety and depression are not normally distributed in the population, thus basic assumptions of the WLSMV are violated.
(2) Using MLR on the 4-point Likert scale of the HADS can lead to underestimated factor loadings (Rhemtulla et al., 2012). However, this potential bias is constant across all estimated models in our analysis and will not affect model comparisons.
(3) MLR estimates allow for the use of DCFI, an effect size to judge measurement invariance, which is not possible for WLSMV (Sass et al., 2014). (4) MLR was shown to have better Type I error rates for the model tests (Li, 2014). We used robust CFI and robust RMSEA as fit indices. Measurement invariance was tested by comparing nested multigroup CFA models with restrictions in parameters. Since Chi-Squared tests have been shown to be overly sensitive, especially in large samples (Cheung & Rensvold, 2002;Sass et al., 2014), we used DCFI which is less dependent on sample size and model complexity (Cheung & Rensvold, 2002). To evaluate measurement invariance, a commonly used criterion is a change in CFI by -0.01 (Putnick & Bornstein, 2016). This process is very well established for ML estimation and also valid for MLR estimation (Sass et al., 2014).
In total, twelve models were tested. These included ten models established by previous studies, and two newly suggested models. For designing the new models, we used the originally proposed structure and excluded one (item 7) or two items (item 7 and 10) that had been found problematic in a number of psychometric analyses of the HADS. All models and item allocations are depicted in Table 1.
We tested measurement invariance for four dichotomous grouping variables: age (< 60 years vs. 60 years), time of response (January 2, 2013 À August 16, 2016 vs. August 17, 2016 À May 25, 2021), gender (male vs. female), and cancer type (solid tumor vs. hematological cancer). Age and gender were examined to identify differences due to sociodemographic characteristics of the patients. Time of response provides insights into possible cohort effects. Because the type of physical illness may affect the presentation of psychiatric comorbidities, we also tested for differences by cancer type.
First, configurational invariance was established by fitting the models to each group individually. Second, metric (weak) invariance was established by fitting multigroup models with factor loadings constrained to be equal across groups. Third, scalar (strong) invariance was established by fitting multigroup models with factor loadings and intercepts constrained to be equal across groups. All analysis were performed in R (R Core Team, 2020) using the packages lavaan  and semTools .

Results
Descriptive statistics of HADS items are shown in Table 3. All items were positively skewed (values concentrated at the lower end of the scale), violating the distributional assumptions of the MLR estimation.
Model fit of all considered models is given in Table 1. Only the one-factor solutions had an unacceptable model fit, indicated by both CFI and RMSEA values. The two models that showed a good fit (RMSEA values < 0.05) were the ones with the original structure and exclusion of one or two items. All remaining models, including the original structure, showed acceptable CFI and RMSEA values. The best fitting model according to CFI and RMSEA was the original model after excluding items 7 and 10. Internal consistencies for this model were a = .828 for the anxiety and a = .867 for the depression factor.
Measurement invariance results for all models are presented in Figure. 1. Ten of the twelve models tested achieved scalar measurement invariance for all four covariates, i.e. age, time of response, gender, and cancer type. Only the one-factor solutions were not invariant according to age and gender. Table 4 details test results for the model with the best fit in the total sample. Further, the CFA estimation of the Dunbar three-factor model and the measurement invariance models of the Caci three-factor model initially yielded invalid solutions. The covariance matrix of latent variables was not positive definite for both models. Therefore, for the Dunbar model, the correlation between factors had to be constrained to a value smaller than 0.99. For the Caci model, the correlation between factors had to be constrained to a value of smaller than 0.94. The necessity of these constrains indicate that two of the three factors in these models measure the same construct.

Discussion
The present study analyzed the factor structure of the HADS in cancer outpatients and examined measurement invariance of twelve different model specifications according to age, time of response, gender, and cancer type. The best fitting model was a 12-item model with the originally proposed item allocation and excluding items 7 and 10. Furthermore, the initially proposed twofactor structure with 14 items (Zigmond & Snaith, 1983) had an acceptable model fit and can thus also be used in cancer outpatients.
The one factor models showed unacceptable model fits and were not invariant for gender and age (recall that a onefactor solution was not initially proposed for the HADS). The Figure 1 Measurement invariance for twelve models of the HADS according to age, time of response, gender, and cancer type. The figure shows measurement invariance of each of the twelve models tested. To achieve measurement invariance, DCFI should be > -0.01. This cut-off is indicated by dotted lines. If the bars plotted cross this line the model is not invariant for the respective grouping variable. Only the one-factor models were not invariant for age and gender. For all other models measurement invariance can be assumed. first study suggesting a one-factor solution was conducted on a small sample of 228 French cancer in-patients (Razavi et al., 1989). A probabilistic study using Rasch analysis also identified one factor, but only after exclusion of three unscalable items (Smith et al., 2006). Other studies testing the one-factor solution of the HADS also found an unacceptable model fit (Albatineh et al., 2021). Although hierarchical models with a higher order distress factor may justify the computation of a total score, these models are not practical in a clinical setting because scoring can be overly complicated (Martin et al., 2008) and specific diagnostic information about whether a patient has symptoms of anxiety, depression or both is lost when using a total score (Emons et al., 2012). Two of the three-factor models we tested could only be analyzed when adding constraints to avoid latent factor correlations larger than 1. As already discussed by Caci and colleagues (2003), the first analysis of the tripartite model by Dunbar and colleagues (2000) also produced high correlations between two of the three factors in the model. Overall, this indicates that two of the three factors measure the same dimension and should not be regarded or scored as distinct constructs. The tripartite theory of anxiety and depression (Clark & Watson, 1991) has its merits, but may not be applicable to the HADS. Most importantly, the HADS was not designed based on this model. Therefore, it may not be possible to reproduce the tripartite model with the 14 HADS items.
Our results support the use of the HADS in cancer outpatients and contradict the voices criticizing the HADS as unstable. Previous studies did not comprehensively consider three important aspects in their unfavorable evaluation of the HADS: (1) characteristics of the sample, (2) single problematic items, and (3) statistical methods used for determining the factor structure. A meta CFA including 21 studies with various patient and community samples found a bifactor solution of the HADS, with a strong general distress factor and two uncorrelated anxiety and depression factors (Norton et al., 2013). However, this study applied analysis methods that did not account for skewness of the data, which may have biased results. Furthermore, studies with a number of different patient-and community samples were included. Since there are indications that the factor structure of the HADS may vary in samples with different characteristics or with different physical conditions, like heart disease or cancer, the results may not reflect the structure of either the included samples.
One point of criticism voiced about the HADS is the omission of important somatic symptoms of depression, such as changes in appetite and sleep disturbance (Coyne & van Sonderen, 2012). As in cancer patients, these symptoms can easily be associated with the illness and/or treatment of cancer, omitting these aspects in screening for depression may be a strong advantage of the HADS in this population, as well as in other patient groups. Zigmond and Snaith (1983) initially introduced the HADS as a scale with eight mandatory items (1, 3, 5, 9 for anxiety, and 2, 4, 6, 12 for depression) and six additional items. Various previous studies suggested omitting unscalable items of the HADS to increase reliability of the two constructs (Caci et al., 2003;Emons et al., 2012;Martin et al., 2008). Our study data indicates that, for cancer outpatients, items 7 and 10 should be excluded when screening for anxiety and depression.
Assessing psychiatric comorbidities in cancer patients using well-evaluated instruments like the HADS or the Brief Symptom Inventory (Calderon et al., 2020) is crucial for providing appropriate psychosocial support. Given the high prevalence of psychiatric comorbidities in cancer patients (Unseld et al., 2019), treatment programs should be enforced (Ichikura et al., 2020), especially in underserved patient groups including patients with low socioeconomic status . However, there are also well-evaluated tools for assessing positive mental health aspects, including resilience (Alarc on et al., 2020) and growth (Oliveira et al., 2021). These should receive additionally consideration in the context of holistic, personalized cancer care.

Limitations
We only tested models suitable for routine clinical practice, thus allowing a straightforward scoring procedure. We did not include hierarchical models in our analysis, because they have to be scored using a sophisticated scoring algorithm which contradicts the intended use of the HADS and is not feasible in clinical practice. Furthermore, for cancer type, we only compared patients with hematological malignancies and solid tumors. No further comparisons between different cancer entities, e.g. breast cancer or lung cancer, were conducted. As a single-center German-language study, our results need to be validated in other cancer-outpatient samples and other languages. However, the large sample size of 3,260 people in the present study supports the reliability and robustness of our results.

Conclusions
The HADS has a stable two-factor structure in cancer outpatients. Even though the initially proposed structure (Zigmond & Snaith, 1983) can be applied, we recommend excluding items 7 and 10, thus reducing the HADS to twelve items to more reliably screen for anxiety and depression.

Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.