Understanding the composite dimensions of the EQ-5D: an experimental approach

The EQ-5D(-5L) includes two composite dimensions: “Pain or Discomfort” (P/D) and “Anxiety or Depression” (A/D), which involves an inherent ambiguity. Little is known about how these composite dimensions are interpreted across contexts where (i) individuals self-report their own health; and (ii) individuals value stylised health states. We detail the nature of the ambiguity and present experimental evidence from two large online surveys (n=1007 and n=1415). In one survey, individuals reported both their current health and their health at the time they felt the worst because of their health. In the other, they valued stylised EQ-5D states using Discrete Choice Experiments with duration as an attribute. In both surveys, participants were randomised into treatments in which the presentation of one of the composite dimensions was altered, or a control. Our results suggest (1) In self-report, use of the composite dimensions differs across the dimensions, with P/D used mainly to report Pain, but A/D used mainly to mean the more severe component of Anxiety and Depression. (2) In valuation, Pain was perceived to be worse than Discomfort at the same level, and Depression was perceived to be worse than Anxiety at the same level. (3) In valuation, the composite dimension P/D was interpreted to mean Pain, whilst the composite dimension A/D was interpreted to lie between Anxiety and Depression. We conclude that care must be taken when interpreting responses to existing health (or wellbeing) descriptive systems that rely on composite dimensions, and that caution should be applied when designing new ones.


Introduction
dimensions are actually used by members of the public when self-reporting their experienced health.
Furthermore, in the context of health state valuation where individuals value stylised health states described using EQ-5D, it is not clear how they interpret different levels of the composite dimensions. Someone presented with a health state including moderate Anxiety or Depression may interpret this as moderate anxiety and no depression, or moderate depression and no anxiety, or some other combination. If systematic differences exist in the way the composite dimensions are used between these self-report and valuation contexts, then health state values used in economic evaluations would be systematically biased. For example, it is conceivable that individuals with moderate anxiety and no depression self-report "moderate Anxiety or Depression" while individuals valuing a stylised health state with "moderate Anxiety or Depression" interpret this as moderate depression and no anxiety. If so, and if moderate depression is considered to be worse than moderate anxiety, there will be systematic overvaluation of health states involving moderate Anxiety or Depression. Further discussion of the possible interpretations and use of the EQ-5D composite dimensions is provided below. The key point is that there are multiple logically consistent but mutually incompatible interpretations of a given severity level of a composite dimension.
This paper examines how the composite dimensions are used in the contexts of self-reporting own health and of valuation exercises. We take an experimental approach, varying the presentation of the composite dimensions between subjects. To explore participants' use of the dimensions, treatments were designed in which either the Pain or Discomfort (or P/D for short) dimension or the Anxiety or Depression (A/D) dimension was altered. In some of the altered presentations, one of the composite dimension's components was not mentioned at all. In other presentations, the composite dimension was presented as two separate dimensions. This approach allows a wide range of possible interpretations of the composite dimensions to be investigated.

Literature
Previous evidence suggests that participants interpret the components of the composite dimensions to represent distinct concepts. For example, Bryan et al (2005) explored the interpretation of the A/D composite dimension. In a focus group study, A/D was presented as two separate dimensions in a three-level EQ-6D. Their qualitative results suggested that respondents tended to "interpret anxiety and depression as distinct and independent concepts". The authors also presented a quantitative study where patients self-reported their health using EQ-5D alongside other clinical measures. The correlation coefficients for the A/D item against clinical measures of depression were similar to the correlation coefficients for the A/D item against clinical measures of anxiety, and the authors interpreted this as evidence to support the use of the composite. However, they did not examine the effects of splitting A/D in the quantitative study. Furthermore, they considered only self-report data, and so any differences between self-report and valuation contexts were not accounted for.
Our approach includes what is essentially bolting-off components of the composite EQ-5D dimensions. Whilst such bolting off has until now only been considered by Tsuchiya et al (2019), a considerable literature exists that bolts on a dimension to the EQ-5D, including cognition (Krabbe et al, 1999;Wolfs et al, 2007), sleep (Yang et al, 2013), vision (Longworth et al, 2014), hearing (Longworth et al, 2014) and tiredness (Longworth et al, 2014). Studies repeatedly find that including a dimension with "no problems" may change the valuation of a health state (also see Brazier et al, 2011, which bolted on Pain or Discomfort to an asthma-specific preference-based instrument). This violates an implicit assumption of preference-based health state classification instruments, namely, that any unmentioned dimensions have no problems. Instead, explicitly stating that a dimension has no problems appears to generate different valuations compared to not mentioning the dimension. While our own approach is different, the conclusions of the bolt-on valuation literature might suggest that splitting a composite dimension into two when it had no problems might change the value of the health state compared to the unaltered version, both (a) if we keep both components; or (b) if we drop one or the other. McDonald and Mullett (2020) investigated the effect of splitting P/D and A/D, whilst simultaneously collapsing Mobility and Usual Activities into a composite dimension. They found that splitting a dimension increased its importance in determining which health state was preferred in pairwise choice, and collapsing two dimensions into one reduced their importance. They concluded that individuals have a tendency towards equally weighting attributes in a multi-attribute choice. However, they were unable to examine the effect of dropping components and did not consider the self-report context.
Finally, Tsuchiya et al (2019) examined, amongst other things, the effect of splitting the composite dimensions of the EQ-5D and presenting both components separately in place of the composite dimension, so forming EQ-6D. Comparing the use of each level of each dimension in self-report, they showed that reports of "no problems" were more frequent when the composite dimensions were presented, compared to where the composites were split into two separate components. This implies individuals do not use the composite dimensions X/Y literally to mean "X or Y" in selfreport. The effect of splitting the composites was more pronounced for A/D than for P/D, and the difference between the two composites may arise because of the differences in the way the components relate to one another. Although pain is commonly interpreted as a more severe form of discomfort (for example, in the well-established McGill Pain Questionnaires (Melzack, (1975;1987)); and indicative evidence in Macran and Kind (2000)), there is evidence to suggest that anxiety and depression are entirely separate concepts. For a more detailed examination of the argument, see Bryan et al (2005).
Furthermore, Tsuchiya et al showed that in valuation tasks, the coefficients for the composite dimensions are related to, but not identical to, the sum of the coefficients on the components when both are presented separately. The patterns of their data suggest that splitting the composite has a different effect for P/D than for A/D, illustrating that we do not fully understand the way composite dimensions are used in the EQ-5D.
We present the first dedicated study to investigate the inherent ambiguity in EQ-5D regarding the P/D and the A/D composite dimensions across the full spectrum of possible interpretations by systematically varying the way the components are presented. Our specific aims were to ask: (1) How are the P/D and A/D dimensions interpreted in self-reporting of own health?
(2) How are the P/D and A/D dimensions interpreted in valuation of stylised health states?
The results suggest that there are differences between the interpretations of the composite dimensions between P/D versus A/D. The interpretation also differs between self-report and valuation tasks for A/D, but our participants applied their interpretations of P/D consistently across the self-report and valuation tasks. Table 1 sets out a stylised scheme that gives a series of logically possible interpretations of a selfreported level on a composite dimension under the assumption that the composite is used to report the severity level for the component on which the most severe problems are reported. It reports what the potential underlying levels of each component could be, for a given severity level of the composite. We only use the first, third and the fifth levels of EQ-5D-5L, since these are sufficient to illustrate our point. The only unambiguous composite dimension is given in row i: "No problems with X or Y", which must mean no problems on either component. However, if the respondent selfreports having "Moderate problems with X or Y", there are at least three potential combinations of the underlying components (rows ii -iv). Worse still, "Extreme problems" has five possible interpretations (rows v-ix). This demonstrates that even if self-reporting behaviour perfectly follows this pattern, it is not possible to logically determine how the levels of a composite dimension ought to be interpreted.

A theory on the use of composite dimensions
With all five levels of EQ-5D-5L, the total number of potential combinations per composite dimension expands from nine to 25 (one for level 1; three for level 2; five for level 3; seven for level 4; and nine for level 5). Since there are two composite dimensions, in effect, if a respondent self-reports level 3 for both composite dimensions, this could logically mean any one of 25 (= 5 x 5) possible combinations of the four components. In the context of health state valuation, respondents may interpret the levels of the composite dimensions in the health states to be valued as: the level of one or the other component which they think is more important; the level of both components; or any other combination. The interpretation is likely to vary across respondents, and may not be stable across the valuation exercise or across the two composite dimensions.

Design
The study used adapted versions of the EQ-5D-5L instrument to collect data on self-reported current health, self-reported health at the time that the respondent felt the worst because of their health (hereafter "worst recalled health"), and valuation of stylised health states. These were conducted over two phases: Phase 1 collected the full valuation data and some limited self-reported data; Phase 2 collected more thorough self-reported data. The analyses reported in this paper are based on the self-reported data from the second phase and the valuation data from the first phase.
For all analyses that could be applied in both datasets, results of the self-reported data from the first phase are consistent with those from the second phase, and are available upon request.
Self-reported worst recalled health was included because self-reported current health of the general public in EQ-5D-5L typically has around a third of the sample reporting full health, meaning little  To clarify the explanation, we will refer to Drop Dis, Drop Pai, Drop Dep and Drop Anx as "partial drop" versions and the non-dropped dimensions will be referred to as Pain only, Discomfort only and so on. Split Pai Dis and Split Anx Dep will be referred to as "split" versions and the dimensions will be referred to as Anxiety separate, Depression separate and so on.

Self-reported own health
After reading an information sheet and giving informed consent, participants' first task was to selfreport their current health using the dimensions for their version. The question asked: "Please indicate which statements best describe your health TODAY." To answer, participants selected the relevant severity statement for each dimension. Next, respondents were asked about their worst recalled health, following the approach of Devlin et al (2017). Specifically, the question asked: "To help you start thinking about how you feel about different areas of health, we would like you to think about the time that you felt the worst because of your health. Indicate which statements best describe your health during the time that you felt worst because of your health." Next, respondents assessed their own general health as: excellent, very good, good, fair, or poor.
Finally, participants were asked to self-report their health on the missing dimensions. Specifically, participants who were in the Partial Drop treatments self-reported their current health on the dropped dimension and on the composite dimension; participants who saw the composite dimensions self-reported their health on each component separately; and participants who were in the Split treatments self-reported their current health on the composite dimension. The process was repeated for worst recalled health, generating complete information about self-reported current and worst recalled health.

Health state valuation
A Discrete Choice Experiment with a duration attribute (DCETTO) was used for the health state valuations. This approach was developed by Bansback et al (2012) and refined in Bansback et al.
(2014) and Mulhern et al. (2018). It combines the Discrete Choice Experiment with the Time Trade Off (TTO) to allow health state values to be elicited from paired choices, with values on a scale anchored at 1 for full health and 0 for a state equivalent to being dead (for a review, see Mulhern et al., 2019). In our application, it involves participants making a series of pairwise choices between stylised health scenarios described by the five (or six) dimensions relevant to their version of the experiment, plus a duration dimension that could take the value 6 years, 8 years or 10 years, followed by death. Since the study had a methodological focus and did not aim to produce an alternative value set for EQ-5D-5L, for the DCETTO we restricted the levels of the health states to be level 1 (no problems), level 3 (moderate problems) or level 5 (extreme problems/unable), omitting "slight" and "severe" problems. This reduces the number of possible health state comparisons, and follows Tsuchiya et al (2019). An example task is provided in Figure 1 and the instructions for the DCETTO task are provided in Appendix 1.
Choice sets were selected using Ngene (Choice Metrics, 2012) with priors of zero and assuming that a conditional logit model was the true model. Ten balanced D-efficient designs were generated for 5D and 6D. Whilst the final designs cannot be directly compared between the 5D and 6D versions (they have different numbers of attributes), they were designed using the same process and were each maximally efficient. We selected 48 choice sets for the 5D versions (the standard and partial drop versions) and 60 for the 6D versions (the split versions). This allows us to estimate a model with linear duration, including main effects and interactions for each of the attribute levels and duration, which involves 21 and 25 parameters for the 5D and 6D cases, respectively. For each participant in each version, 20 of these pairs were drawn and presented at random from the 48 or 60 possible tasks. understanding. The final practice question did not involve a dominated option.
After completing the main valuation tasks, respondents were invited to select all statements that applied to their experience of the health state valuation exercise.

Background characteristics
At the end, demographic information was collected, including gender, age, having experienced serious illness, employment status, education, and whether the participant was responsible for children under 18.

Recruitment
Participants for both phases were recruited through the Prolific.ac online participant pool. Members of this pool have varied ages, genders and incomes and are drawn from geographically diverse locations. We did not pre-screen on demographic characteristics or language, but restricted the sample to UK residents aged 18 or over. Importantly, the samples for both phases were drawn from the same population. In an information page, participants were informed that their data would be kept confidential and would not be linked to their identity. Ethical approval was granted by the University of Warwick's Humanities and Social Sciences Research Ethics Committee. Participants in phase 1 received £1.50 for their participation and participants in phase 2 received £0.80 (since their task was less time consuming). Phase 1 data were collected in July 2017 and phase 2 data were collected in August 2019.

Analytical approach
Self-report data To analyse the data for self-reporting own health, we first conduct a test of 'literal' behaviour, establishing whether participants treated the composite dimensions as literally meaning "or" by comparing the incidence of "no problems" being reported, since these are the cases for which the composite and its components are unambiguously related (see Table 1 for details). In the event that these tests show that participants do not appear to treat the composite literally, we then conduct a test of self-reporting rules, which established what reporting rule best fits the data.
We hypothesise four potential self-reporting rules as follows: 1. The composite is used as the worse problem across the components (i.e. as an "or") reporting rule, we take the difference between the actual and the predicted levels for P/D and A/D.
Finally, for each composite dimension and by each of the self-reporting rules, we calculate the mean absolute difference between the prediction and the actual report, pooling across the individuals. This gives a quantified measure of the error in predictions. The self-reporting rule with the smallest error best represents participants' self-reporting behaviour.
To provide confidence intervals around the estimate of error for each self-reporting rule, we apply a bootstrapping approach. For each composite dimension, by each self-reporting rule, participants are randomly sampled with replacement and the mean absolute error calculated for 10,000 samples.
The distribution of mean absolute errors is used to find the 95-percentile range.

DCETTO data
To analyse the valuation data from the DCETTO, we follow the approach taken by Bansback et al.
(2012). The approach is to model participants' utility ( ) where i denotes the participant and j denotes the health scenario being considered, such that j=1,2 are the scenarios considered in each pairwise choice. Utility is modelled as a function of all possible attribute levels of the EQ-5D (or 6D in the split conditions), and duration. We let a vector of dummy variables for each possible attribute level be x, with "no problems" as the reference category. We model duration, t, as continuous. This gives the formula (drawn from Bansback et al. 2012): In this model, is a constant measuring the tendency to choose the specified option, ceteris paribus; captures the preference for living in full health for one year, is the disutility associated with the levels of the attributes specified in the given health state when experienced for one year, and is the error term, assumed to be iid. We make the standard assumption of constant proportional trade-offs in life years, modelling duration as a continuous, linear variable.
A mixed model logistic regression is estimated for each presentation version, to establish the factors influencing the choice of one health scenario over another. This involves specifying, for each comparison, the difference between the two options on the specified dimensions (duration and attribute levels) and modelling the probability that an option is chosen over another option given these differences. Dummy variables are used for the attribute severity levels (variable in Eq. 1). If the health scenario presented on the left-hand side of the screen has the relevant severity level on a dimension, the value of the dummy is set to 1; if the scenario presented on the right has this severity level on this dimension, the dummy is set to −1. If the health scenarios have the same severity level in a given dimension, the corresponding dummy is 0. Random effects are estimated at the respondent level.

= + + ′ + (1)
A further step is to anchor the coefficients, which allows them to be compared to one another and interpreted meaningfully. This is done, again following Bansback et al. (2012), by dividing by for each element of .

Demographics
In total, 1415 participants took part in the self-report phase. Each variant has just above 200 respondents. Appendix 2 reports the demographic characteristics, demonstrating that the randomisation into versions was successful.

Descriptive statistics on use of composite dimensions
The only logically unambiguous interpretation of the composite dimensions relates to the reporting of "no problems" (See Table 1 for details). In self-reporting their current health, pooling across versions, we find that 47% of participants (n=666) reported no problems with Pain or with Discomfort when these were reported separately, and 51% (n=718)  However, with over one third of the sample reporting no problems with anxiety or depression, and almost half reporting no problems with pain or discomfort, current health is clearly not the ideal testbed for examining patterns of self-reporting. Self-reported worst recalled health provides a useful alternative. Pooling across versions, we find that 13% of participants (n=179) reported no problems with Pain or with Discomfort when these were reported separately, whilst 19% reported no problems with the composite dimension P/D. This difference in proportions is strongly Comparing self-reporting rules Next, we ask what self-reporting rule best fits the observed data, out of the four rules detailed in the study design section. For each composite dimension, we excluded cases where the participant had reported the same value across both components since all models predict the same outcome in these cases. When reporting pain and discomfort, 1111 subjects gave the same ratings when describing their current health, and 901 did so when reporting their worst recalled health. For anxiety and depression, 864 participants gave the same ratings when describing their current health, and 726 when reporting their worst recalled health.

Figure 2
Mean absolute error in predicting the composite dimension from the components, using each of the four self-reporting rules. 95% confidence intervals shown. Higher bars signify worse prediction errors and less support for that hypothesis. Figure 2 shows the mean absolute prediction error of each self-reporting rule, using the data pooled across the seven versions. In Appendix 4 we report these absolute prediction errors for each version separately. When reporting current P/D, the best predictions come from assuming P/D means Pain (i.e. the first mentioned component rule). This self-reporting rule significantly outperforms both a 'worse component' rule and discomfort alone, though confidence intervals are overlapping with a mean average rule. When worst recalled health is reported, the results strengthen and clearly support the interpretation that P/D is used to report Pain.
For current A/D, the 'worse component' self-reporting rule, in which the composite level is reported at the same as the more severe of the components, is the most accurate in predicting responses. This rule significantly outperforms than the mean rule and the 'depression alone' rule, but confidence intervals overlap with those for the 'anxiety alone' rule. Again, the results strengthen when participants reported their worst recalled health. Here, the 'worse component' rule significantly outperforms all others.

Valuation
A total of 1007 participants completed the valuation phase, and another 18 began the study but failed to complete it. Each variant has 123-149 respondents, and Table A2 in Appendix 2 presents the demographics for each of the seven versions, showing that randomisation resulted in similar demographics across versions. In the practice questions, of the 1007 participants that took part in the study, 47 subjects made a single mistake: 24 in the first choice, 14 in the second and 9 in the third. No participant made more than one mistake.
The data from the follow-up questions on engagement indicate that most respondents did not struggle with completing the study and they tended to find it interesting and clear. The results are presented in Appendix 5.

Discrete choice experiment with duration
The estimated beta coefficients from the regression analyses are presented in Appendix 6.
Across the presentation versions, the coefficients' signs, and the difference in their magnitudes between levels within a given dimension, are as anticipated. Additional years of life significantly increase the likelihood of a scenario being selected across all versions. In almost all cases, problems on a dimension reduce the likelihood that the scenario is selected, compared to having no problems. The exception is Mobility at level 3 (moderate problems) in the Pain only version, where the coefficient has the expected sign but is not significantly different from zero. In all cases, extreme problems (level 5) reduce the chance of selecting the scenario by more than moderate problems (level 3). and grey bars represent extreme problems (level 5). 95% confidence intervals shown.
Asterisks represent versions where the P/D composite was altered, and so differences might be expected.  and grey bars represent extreme problems (level 5). 95% confidence intervals shown.
Asterisks represent versions where the A/D composite was altered, and so differences might be expected.
The heights of the bars can be compared within and across versions. Clearly, level 5 problems are significantly worse than level 3 problems across all versions, and both are significantly worse than the baseline case, no problems. The interesting comparison is between versions. As anticipated, there are no statistically significant differences in the importance of pain or discomfort across the first, fifth, sixth and seventh pairs of bars, since in these versions the P/D composite was presented unaltered. The highest coefficient is that where level 5 Pain and level 5 Discomfort were presented separately, in the Split Pai Dis version.
Comparing the partial drop versions, Drop Dis and Drop Pai, reveals that in valuation, Pain only is clearly perceived to be worse than Discomfort only. Comparing these with the standard presentation reveals that, for levels 5 and 3, the utility decrement from Discomfort only is significantly smaller than the composite P/D, whist the decrements for the composite P/D and for Pain only are indistinguishable. This suggests that when presented with the composite P/D in a valuation exercise, people interpret this as pain.
Turning to Anxiety and Depression in Figure 4, as anticipated we observe no differences between anchored coefficients from the versions where the composite A/D was presented (that is, the first four versions on the diagram). The split treatment where the components were presented separately i.e. where the coefficient is the sum of the coefficients on the individual componentsgenerates a significantly higher utility decrement compared to the composite, but less than the sum of the two coefficients from the partial drop versions. This holds for both severity levels, 3 and 5. Comparing the partial drop versions and the standard presentation, the level 5 coefficient for Depression only is significantly greater than that for Anxiety only, yet in this case, the composite decrement lies between the decrement for Anxiety and that for Depression.

Discussion
We highlight an inherent logical ambiguity in the EQ-5D system: it is not possible to logically determine how the levels of a composite dimension ought to be interpreted. We clarify the nature of this ambiguity, and provide empirical evidence that explores how the composite dimensions are actually used in self-report and interpreted in valuation exercises, by systematically altering the presentation of the composite dimensions. The results suggested three key findings: 1) In self-reporting of own health, the use of the composite differs across the dimensions.
People appear to use P/D to self-report mainly the level of pain, whilst using A/D to selfreport the component out of anxiety and depression for which they have more serious problems. This pattern is most clearly apparent when respondents self-reported their worst recalled health.
2) In valuation of stylised health states, the split versions showed that Pain was clearly perceived to be worse than Discomfort at the same level, and Depression was clearly perceived to be worse than Anxiety at the same level.
3) In valuation, the composite dimension P/D had a utility decrement similar to that for Pain, whilst the decrement for the composite dimension A/D was between that for Anxiety and for Depression.
A recurring insight from our results is that the interpretation of the EQ-5D composite dimensions is sensitive to the dimension of health in question: P/D is differently related to its components Pain and Discomfort than A/D is to Anxiety and to Depression. We find encouraging consistency in how the P/D composite is used and interpreted between task types: in both self-report and in valuation, we found evidence that P/D mainly represents pain. However, we find that A/D is inconsistently used across tasks, with the 'worse problem' interpretation holding for self-report, but not for valuation. This raises questions surrounding whether systematic biases arise when using self-reports in combination with value sets to value conditions involving a mental health detriment.
It is useful to compare these patterns with those found in Tsuchiya et al (2019), which had two null hypotheses relevant for our study. The first was that "the proportion of people who self-report level 1 in a composite dimension is no different from the proportion of people who self-report level 1 in both components when the dimension is decomposed"in other words, that people use the composite dimensions literally when self-reporting no problems. This was not rejected for P/D, whilst it was for A/D, which is inconsistent with our results. They do not analyse the reporting of having problems at different severity levels. Their second null hypothesis was that in the valuation context "the disutility associated with a composite dimension is no larger than the disutility associated with either component at the same level". For the level 3 coefficients, this hypothesis was not rejected, but for the level 5 coefficients it was rejected. It appears that level 5 P/D is interpreted as extreme pain, while level 5 A/D was interpreted as extreme depression. In contrast, in our case P/D was always interpreted to mean pain, not just at the extreme level. However, the general conclusion is robust: there is an interacting effect between the context and the composite dimension.
What do our results mean for those using the EQ-5D to measure and value health? Firstly, it suggests there may be a failure to capture some important elements of health states. Specifically, since the P/D dimension in our study appears to be interpreted to mean pain consistently across these two contexts, the composite P/D might fail to capture discomfort either in self-report or in valuation. This is despite the fact that the Partial Drop valuation task resulted in significant reductions in the perceived utility of a health state involving moderate or extreme discomfort.
Furthermore, when both pain and discomfort co-occur, this would not be captured by the composite.
Nevertheless, if discomfort is generally considered a mild form of pain, this concern is arguably mitigated.
Secondly, it raises difficulties when interpreting the health states that underlie self-reported EQ-5D profiles. The interpretation of A/D is not consistent across tasks. It appears to be interpreted as "the component of Anxiety and Depression with the most severe reported problems" in the self-report context in our study. However, in valuation it appears to be interpreted as "an average of Anxiety and Depression" in our study. This compounds the broader problem, described earlier, that the EQ-5D is fundamentally incapable of distinguishing between the health of people with different combinations of severity on the sub-dimensions. Our results from the Anx Dep split model clearly suggested that Depression is considered to be more serious (with an anchored disutility of -0.26) than Anxiety at the same level (-0.122), and that having both Anxiety and Depression gives the biggest detriment of all (-0.382). Yet, these differences cannot be inferred in a valuation exercise with the composite dimension. Since a significant minority of our sample in the split versions selfreported problems with Anxiety only, Depression only, or both, this heterogeneity of health states that are indistinguishable from the self-reported composite dimensions is likely to result in serious misallocation of health resources, with a bias towards the undervaluation of Depression, and undervaluation of the co-morbidity of Anxiety with Depression.
The overarching practical implication of the evidence we presented is that care must be taken when identifying health states from people in self-report, and linking these to valuations. Based on our results, the composite dimensions cannot capture the co-occurrence discomfort with pain. Nor can it distinguish between different combinations of severity levels of anxiety and depression. This may result in under-valuation of some health states (for example co-morbidities) and overvaluation of others (for example, self-reported anxiety with no depression being interpreted as depression with no anxiety). More research is required to understand whether splitting both of the composites to create an EQ-7D would be appropriate, especially given the extra burden this would place on valuation studies. An EQ-7D with five levels each will have 78,125 unique health states, a substantial increase from the current 3,125 with EQ-5D-5L (and the 243 with EQ-5D-3L). Valuing such an instrument using a DCE with duration, will involve choice tasks made up of sixteen pieces of information instead of twelve. Furthermore, if (as we contend) discomfort overlaps with the mild end of pain, then the dimensions will not be fully independent and impose restrictions for choice design (for example, no or slight discomfort cannot appear alongside severe or extreme pain), despite the estimated algorithm predicting values for such health states that ought not exist. A realistic proposal for re-organising the EQ-5D must be based on a thorough analysis of the benefits and costs of doing so (including the additional valuation studies and the management of the transition period in health technology assessment procedures). However, based on the current study, we would suggest that in such a cost benefit analysis EQ-6D, where P/D is split and Discomfort dropped while A/D is split into two separate dimensions, should be given consideration.
We focused on the EQ-5D because of its widespread use, but we believe the concerns raised by our researchand the opportunities for further investigation and improvement of the methodologyapply more broadly to any health (or wellbeing) descriptive system that relies on composite dimensions. These include, but are not limited to, the AQOL-8D (pain or discomfort); 15D (the excretion and mental function dimensions); and SF-6D (the mental health dimension). The issues we raise should also be taken into account when designing new instruments.
Future research could explore the mechanisms underlying the patterns we observe. For instance, work could be done to understand the role of language through studying other language versions of the EQ-5D. Another extension would be to study whether those experiencing mental health problems self-report and value the composite A/D differently. We asked respondents to report their worst remembered health, which goes some way to achieving this, but exploring the same issues with a dedicated sample of this kind would allow us to further generalise our findings.