An investigation of age dependency in Dutch and Chinese values for EQ-5D-Y

Aims The primary aim was to explore the age dependency of health state values derived via trade-offs between health-related quality of life (HRQoL) and life years in a discrete choice experiment (DCE). The secondary aim was to explore if people weigh life years and HRQoL differently for children, adolescents, adults, and older adults. Methods Participants from the general population of the Netherlands and China first completed a series of choice tasks offering choices between two EQ-5D-Y states with a given lifespan. The choice model captured the value of a year in full health, disutility determined by EQ-5D-Y, and a discount rate. Next, they received a slightly different choice task, offering choices between two lives that differed in HRQoL and life expectancy but produced the same number of quality-adjusted life years (QALYs). Participants were randomly assigned to fill out the survey for three or four age frames: a hypothetical person of 10, 15, 40, and 70 years (the last one only applicable to China) to allow the age dependency of the responses to be explored. Results A total of 1,234 Dutch and 1,818 Chinese people administered the survey. Controlling for time preferences, we found that the agreement of health state values for different age frames was generally stronger in the Netherlands than in China. We found no clear pattern of differences in the QALY composition in both samples. The probability distribution over response options varied most when levels for lifespan or severity were at the extremes of the spectrum. Conclusion/discussion The magnitude and direction of age effects on values seemed dimension- and country specific. In the Netherlands, we found a few differences in dimension-specific weights elicited for 10- and 15-year-olds compared to 40-year-olds, but the overall age dependency of values was limited. A stronger age dependency of values was observed in China, where values for 70-year-olds differed strongly from the values for other ages. The appropriateness of using existing values beyond the age range for which they were measured needs to be evaluated in the local context.


Introduction
In recent years, the demand for pediatric multi-attribute utility instruments has grown (Chen and Ratcliffe, 2015).One of these utility instruments is the EQ-5D-Youth (EQ-5D-Y), a childfriendly version of the well-known adult questionnaire EQ-5D-3L (Wille et al., 2010).It contains the same five health dimensions, although the wording of three of them (i.e., self-care, anxiety, and usual activities) has been modified in order to fit the needs of the younger respondent.A VAS scale is also included, with endpoints of 0 (the worst health you can imagine) and 100 (the best health you can imagine).The EQ-5D-Y questionnaire can be filled out by children from the age of 8, while for children aged 4-7, a proxy version can be applied.EQ-5D-Y value sets are currently available for nine countries (Devlin et al., 2022).
Key challenges in the area of child health valuation are the impact of different perspectives, i.e., adult, adolescent, or child preferences, and the impact of different health state valuation methods (Rowen et al., 2020).The EQ-5D-Y valuation protocol requires that the general population should be asked to value the EQ-5D-Y health states as proxies for children.People no longer value the health state of a person like themselves but of a 10-yearold hypothetical child.To date, it is unknown whether the obtained values will be sensitive to the specified age of the hypothetical child (e.g., a child aged 10 or an adolescent aged 15), and if so, what framing of age is optimal.
The available evidence about the age dependency of healthrelated quality of life (HRQoL) values is limited.Kind et al. showed that by using the visual analog scale (VAS), the obtained values were lower for children when respondents were asked to imagine that a health state concerned a 10-year-old child compared to when they valued that state for themselves or another adult (Kind et al., 2015).These results suggest that health problems will affect a child's HRQoL more than an adult's HRQoL.However, the Kind's VAS values were obtained on a scale with the best and worst imaginable health states as the top and bottom anchors and not on the full health-dead scale required for the computation of quality-adjusted life years.Kreimeier et al. (2018) reported that TTO values for children exceed those for adults in the same health state.Shah et al. (2020) found the same result across a range of methods that all produced values on the full health-dead scale.
To better understand the age dependency of HRQoL values, we need to carefully examine the context and meaning of responses given to questions, especially when they involve a time trade-off.Because TTO values are derived from a trade-off between HRQoL and time, HRQoL values are confounded with preferences for time.As a result, differences in TTO values for adults and children have a clear interpretation: Are changes in health affecting children's HRQoL less, or are variations in time preferences impacting the difference as well?This issue needs to be investigated further in order to better understand differences in health state values for children, adolescents, adults, and/or older adults and to advance valuation methods.
The main objective of our research was to examine how age impacts the valuation of EQ-5D-Y health states using a discrete choice experiment (DCE) that included a duration attribute.The second objective was to study if there are cultural differences when valuing health states for children, adolescents, or adults.The third objective was to explore if people attach different relative weights to life years and quality of life for children, adolescents, adults, and older adults.

Strategy
Respondents in the Netherlands were randomized over three arms that only differed by the framing of the valuation task with respect to the age of the hypothetical person that would experience the health states: 10 years (arm 1), 15 years (arm 2), or 40 years (arm 3), representing a child, an adolescent, and an adult.The study in China adopted the same study design as used in the Netherlands and extended it with a fourth study arm focused on older adults over 70 years.This was done to increase the contrast between arms and increase knowledge of the validity and valuation of the EQ-5D in the elderly population.Respondents in both countries completed two tasks.First, they received a series of questions from a discrete choice experiment featuring EQ-5D-Y health states with an associated duration.Next, respondents received a series of questions asking about their preferences for a "QALY composition".Details of both tasks are provided below.Approval for this study was given by the Ethics Committees of the University of Maastricht and the Institutional Review Board of Fudan University School of Public Health before the start of the study.Data collection took place between August and December 2017 in the Netherlands and between May and July 2019 in China.

EQ-D-Y
EQ-5D-Y is a five-dimensional measure of health-related quality of life, derived from EQ-5D (Wille et al., 2010).The included dimensions are mobility, looking after myself, doing usual activities, having pain or discomfort, and feeling worried, sad, or unhappy.Each dimension has three levels: no problems, some problems, and a lot of problems.

Sample
In the Netherlands, respondents were recruited from a commercial panel "Panelinzicht".Each respondent received an invitation with a link to participate in the survey.To make sure the sample was representative of the Dutch population, stratified sampling was applied.This means that three strata were defined beforehand: age (with 18 years as a minimum age), gender, and education.Based on the classification as used by Statistics Netherlands (Centraal Bureau voor de Statistiek), the eight levels of education were divided into lower, middle, and higher education.In China, the respondents were enrolled by Survey Engine, and quota sampling was used to generate a representative sample of the general adult population in terms of age and gender.

The survey
The online survey was developed by Survey Engine in both countries, with the Dutch version being translated into a Chinese version.It started with three questions regarding birth date, gender, and education.Subsequently, respondents were asked to describe their own health based on the EQ-5D-3L and the VAS scale.Then, the objective of the study was explained, and respondents were asked to fill in 15 choice tasks from a discrete choice experiment (DCE).The choice tasks were formatted as matched pairwise choices, following Jonker et al. (2017).This means that they first were asked which of two EQ-5D-Y states, A or B, they preferred for either a 10-year-old child, a 15-year-old adolescent, a 40-yearold adult, or a 70 year old (Chinese version).Both options differed in health but shared an equal life span.Next, they were asked to choose between health states B and C. C represented perfect health, i.e., no problems in any of the five EQ-5D-Y dimensions, but always offered fewer life years compared to B. To make the choice task easier, color coding was applied, with more severe problems darker colored and less severe problems lighter colored (Jonker et al., 2018a).After finishing the DCE, feasibility questions were presented, which means that respondents were asked whether they experienced any difficulties when choosing between A or B and B or C. Examples of both choice sets are presented in Figures 1, 2.
Next, we presented a slightly different choice task, that we dubbed a "QALY composition task".Eight QALY composition tasks were administered.We developed the task to let responses directly tell if people weigh life years and quality of life differently for children, adolescents, adults, or older adults.The QALY composition task involved choices between different ways of achieving a similar QALY total [e.g., life A 2 years in full health (100% QoL) vs. life B 4 years in 50% QoL].Respondents could indicate their preference for life A or life B on a 5-point Likert scale, varying from a very strong preference for life A to a very strong preference for life B.An example of a QALY task is presented in Figure 3. Eight QALY composition tasks were administered.We developed the task to explore if the relative weights attached to time and HRQoL vary for children, adolescents, adults, or older adults.
At the end of the survey, a number of background questions were asked: employment status, experience in working with children, having children, experience with serious illness of a child, experience with own health during youth, having brother(s) and/or sister(s), experience with serious illness in sibling(s), whether it would have been worse or not if the respondent would have experienced the health states described in the survey instead of the hypothetical 10, 15, 40, or 70-year-olds, what kind of child, adolescent, adult, or older adult they were thinking of when answering the choice tasks, and what kind of religion they belonged to.

Experimental design DCE
An experimental design with 150 matched pair-wise choice questions was generated using a two-step approach.The EQ-5D-Y states featured as options A and B were selected first, subsequently, option C was added, and in a separate step, the duration levels associated with options A, B, and C were selected.This two-step approach was used to promote consistency with a UK study that used a DCE without duration (Mott et al., 2021 plenary meeting of EuroQoL).Briefly, A and B were selected using an algorithm to create a Bayesian efficient design programmed in Stata.The candidate set was restricted to pairs that had overlapping severity levels in two dimensions.The design accounted for the main effects and two-way interactions.The initial design was created without priors, but data collection was paused two times to allow interim analysis of the data.The obtained coefficients were used as priors to update a design for the next round of data collection.As mentioned above, the C alternative always referred to full health, and hence dominated A and B in terms of quality of life, but was paired with a shorter duration, implying a time trade-off question.The selection of the levels of duration associated with A and B (the same level) and with C (a shorter duration) was also informed by a Bayesian efficient design algorithm (cf.), but this part was programmed in C++ because the utility function accounted for possible non-linearities in preferences for time (i.e., discounting), which standard software packages such as NGENE or STATA could both not handle (Jonker et al., 2018b).
Blocking was applied to divide the 150 matched pairwise choice tasks into 10 blocks, with each block containing 15 pairwise comparisons.

Experimental design QALY composition task
The QALY composition task was constructed on the basis of an orthogonal array.The four variables linked to the orthogonal array were: 1. Life years of a (levels: 2, 4, 6, or 8) 2. Quality of life of a (levels 0.2, 0.4, 0.6, and 0.8) 3. Quality of life of b (levels 0.2, 0.5, and 0.7, 1) and 4. The ratio of total QALYs in a/b (levels 0.8, 1.0, and 1.2).
Together, these four variables were used to define the life years of B, as indicated in Table 1.The scenarios presented to respondents in the QALY composition tasks were defined by the variables in the shaded columns.Three variables were directly obtained from the orthogonal array, and the fourth (life years in option B) was computed by matching the information of the first three variables with the QALY multiplier.This procedure ensured that decision rules based on longest life, highest quality of life, or maximum number of QALYs would produce different results.

Framing of the survey for the age groups
Exactly, the same DCE design and design of the QALY composition task were used in all arms.The only difference between arms was that respondents were asked to imagine that the health states applied for a different hypothetical person, aged 10, 15, 40, or 70 years.We used the wording of the EQ-5D-Y questionnaire to describe the health states of all arms.Only the examples mentioned between brackets for the dimension usual activities were taken from the adult version of the EQ-5D-3L for the 40-year and 70-year-old arm.For every respondent, randomization was applied per arm, per block, per choice task, and in the left-right order of the health states A and B.

Data quality management
We retained respondents in the sample if they had completed the DCE survey and were not classified as speeders.Speeders were removed from the sample using a speeding threshold set at 530 s for the entire survey.We set this relatively low threshold to account for the fact that choice questions in a DCE repeat much of their content and to avoid undue exclusion of valid responses.

Discrete choice experiment
Logistic regression was used to analyze the respondent's DCE choices (STATA version 14).The parameters of the conditional logit model were estimated using maximum likelihood estimation.Conceptually, the utility that the respondent n obtains from alternative j in a choice task t is computed as the utility obtained from the health state characteristics X njt with their accompanying preference parameters (β n ), multiplied by the net present value (NPV njt ) of the number of years T njt associated with that health states, i.e., An exponential discount function was used to compute NPV (Jonker et al., 2018b), which defines NPV by the discount rate r, i.e., Dummy coding was applied for the levels of the EQ-5D-Y with no problems as a reference level.The coefficients from formula 1 that are associated with the dimension severity levels can be converted to the preferred scale for QALY computation, by dividing the relevant β n by the preference parameter associated with years, based on the Net present value computation.

Feasibility
Feasibility questions for the DCE were analyzed with descriptive statistics in SPSS version 16.

QALY composition
The QALY composition task provided ordinal responses on a 5-point Likert scale.By arm, we computed and compared the percentages of responses in each category.We graphically display the results using horizontally stacked bars.Because minimal differences were found, no attempt was made to study differences across arms using non-parametric tests.

Characteristics of the sample
In total, 5,126 Dutch and 4,128 Chinese respondents started the survey, with 1,730 or 2,494 respondents completing it, resulting in a response rate of 34 and 60%, respectively.A total of 496 people were excluded from the Dutch sample as speeders and 676 from the Chinese sample.After these exclusions, the Dutch sample had N = 438 respondents in arm 1 (10 years old), N = 450 in arm 2 (15 years old), and 346 (40 years old) in arm 3. The final Chinese sample had 454, 455, 454, and 455 respondents in arms 1 (10 years old), 2 (15 years old), 3 (40 years old), and 4 (70 years old), respectively.Sample characteristics are presented in Tables 2A, B. The samples were representative of the populations in terms of sex and age, although the percentage of respondents with lower education in the Netherlands was smaller compared to the population as registered by the Dutch National Bureau Of Statistics (CBS), while in the Chinese sample, the percentage of respondents with college and higher education was much higher compared to Chinese norms (CotSNPCNE, n.d.).

Feasibility
Tables 3A, B shows the answers related to the feasibility questions.In the Netherlands, 53% of the 10-year-old arm felt it was difficult to choose between health states A and B, compared to 45% of the adolescent arm and 34% of the adult arm.In addition, when making a choice between an impaired health state B and perfect health state C but with a shorter life duration, 58% of the respondents in the child arm and 49% in the adolescent arm answered that it was difficult to very difficult compared 43% in the adult arm.On the contrary, respondents across the four arms in China felt the degree of difficulty was similar.
The percentage of respondents answering that their choices would not have been different if they themselves had experienced the health states rather than a hypothetical child, adolescent, adult, or older person, varied across arms in the Netherlands (Table 3A).A total of 62% of the respondents in the adult arm indicated that answering the questions for themselves would have resulted in the same responses, vs. 36% in the child arm and 44% in the adolescent arm.In the child and adolescent arms, 28 and 24% of the people considered health problems or loss of life years less bad for themselves, whereas, in the adult arm, respondents more often considered these issues worse for themselves.In China, fewer people stated that their responses would have been the same if they were asked about preferences for themselves (11-24% varying across arms), and the majority (varying between 51 and 58%) of the people in all arms state that they would consider health problems or loss of life years worse for themselves (Table 3B).

Results discrete choice experiment
Tables 4A, B shows the results of the regression model on a latent scale for the Netherlands and China.The parameter "years" reflects the additional utility gained from a life year without health problems, before discounting, and is positive-as expected.In both countries, results show that additional life years generate utility.The interaction terms in the Dutch regression model all have the expected negative sign, except mobility level 2, showing that a deviation from full health with no problems is considered negative.The interaction terms for level 2 problems on the dimensions of self-care, usual activities, and pain/discomfort showed unexpected positive signs in China.
The estimated discount rate r varied between 0.22 and 0.25 across the arms in the Netherlands and was ∼0.30 in China in all four arms, suggesting strong discounting of future health outcomes.
Figures 4A, B presents the results on a QALY scale (coefficient interaction term divided by coefficient years).Across arms in the Netherlands, we found a high level of agreement on the health state values, except for the dimensions of pain and discomfort and anxiety/depression; respondents traded-off more time to avoid these problems for children than for adults.The Chinese results showed that respondents traded-off more time to avoid severe problems in the 70-year arm.
The difference in values for the worst health state (33,333) resulted in −0.630 for children, −0.452 for adolescents, and −0.452 for adults in the Netherlands.On the contrary, older adults in China have a value of −0.870 for the worst state, followed by adolescents (−0.370), children (−0.340), and adults (−0.320).

QALY composition
Figures 5A, B presents the distribution of the Likert responses by QALY composition task.We found no clear pattern of differences across arms in both countries.The distribution over response options varied most when the life years or quality of life were at the extremes of the spectrum.In the Netherlands, the only distinction between arms was that the percentage of responses in the third response category, indicating no preference for A or B, seemed to be the largest when the questions concerned a 10year-old child.The Chinese results showed a larger percentage of respondents, indicating no preference between life A and life B compared to the Dutch data, with similar or even less clustering in the child's arm on the no preference option.

Discussion
This study examined the impact of framing of age on values for EQ-5D-Y health states in the Netherlands and China.We tested this issue using a DCE duration approach and a task that assessed preferences for QALY composition.The empirical findings indicated that the values derived from the ./fpsyg. .No evidence for age dependency of health state values was found in the Netherlands.Our results for the 10-year-old arm are consistent with Kreimeier's TTO results (Kreimeier et al., 2018).Based on international results, Kreimeier reported that TTO values applied to children generally were higher compared to values of adults, but in that study, the Dutch results were an exception.In the Netherlands, people gave a lower TTO value to a health state when it concerned a 10 years old compared to themselves (Kreimeier et al., 2018).This indicates that Dutch respondents are prepared to trade-off life years against the quality of life for children.In our research, the results also showed that respondents were prepared to trade-off more time to avoid pain in children than in adults, resulting in lower values, although generally, the agreement of health state values for different ages was quite strong.While the congruence between studies supports the validity of our findings, care should still be taken when generalizing our results to other countries.Stronger evidence for age dependency of values was found in China, where the inclusion of the 70-year arm increased the contrast between groups.
Our estimation of health state utilities followed a stateof-the-art DCE duration approach, requiring a multiplicative utility function that involves a non-linear discount function.The estimated discount rates indicated that respondents valued quality of life in the short term more compared to the long term, which was anticipated, and as argued by Jonker and Bliemer (2019), valid health state utility values can only be obtained if the model adequately accounts for such time preferences.The estimated discount rates were, however, relatively high when compared to the standard rates usually applied in economic evaluations, especially in China.While the discount rates were still within the range of previously estimated discount rates for health-related outcomes Frontiers in Psychology frontiersin.org(Attema et al., 2018), their reliability needs to be established in the future research.A limitation of the DCE duration method is that the best way to account for time preferences, especially in the presence of discounting, has not been identified.Discount rates can be computed in different ways.Models that account for non-linear time preferences are complex and have not been implemented yet in the standard software that we used for choice modeling, and this limits the modeling options (e.g., we cannot simultaneously account for preference heterogeneity and for non-linear time preferences).Furthermore, this way of assessing preferences places high demands on the design, necessitating interim design updates to ensure that the design is based on adequate priors, and the end results may still depend on the data quality obtained along the way.We excluded speeders post-hoc, not before design updates.
If we are examining preferences for a subject like a trade-off between life years and quality of life, we also need to carefully consider what advantages and disadvantages different valuation methods may have when used in such a context.We consider it possible that the use of TTO poses even greater challenges than the DCE of the required accuracy in rating health states and direct assessment.A specific result that may be worth noting is the larger clustering of responses in the child arm vs. the other arms on the no preference answer option in the QALY composition task in the Netherlands.This might indicate that a larger fraction of respondents in the child arm feel uncertain when trading-off quality of life and life years.However, it is also possible that respondents are neutral about their preference for either one of the options and consider them equivalent.Either way, it shows that more respondents in the child's arm were reserved when making a choice.However, it appears that the Chinese results showed a reversed pattern, with more respondents in the child's arms who were more certain to make a decision.The possible explanation may be a cultural difference: paternalism is more prevalent in China.
The findings of this study may be taken into consideration for future updates of the EQ-5D-Y valuation protocol.EQ-5D-Y values are currently elicited from adults who value health states accruing to a 10-year-old child (Ramos-Goñi et al., 2020).This study reflects on the appropriateness of using a specified age (here, 10 years of age) in the elicitation of values that are used across a wider age group by varying the specified age.Age dependency of values was limited in the Netherlands, suggesting that values elicited for a 10-year-old child may also be validly applied for a 15-year-old.However, in China, the values for 70-year-olds differed strongly from the values for other ages, suggesting that the appropriateness of using a fixed, specified age may be questioned.Moreover, many respondents indicated that their choices would have been different if the health state had been experienced by themselves rather than by someone else.This finding is in line with results from other studies (Lipman et al., 2021;Reckers-Droog et al., 2022).More research on the sensitivity of values to age and perspective is warranted.

Conclusion
Age dependency was observed in the stated preferences for hypothetical health states.The magnitude and direction of age effects in values seemed dimension-and countryspecific.In the Netherlands, we found a few differences in dimension-specific weights elicited for 10-and 15-year-olds compared to 40-year-olds, but the overall age dependency of values was limited.A stronger age dependency of values was observed in China, where values for 70-yearolds differed strongly from the values for other ages.The appropriateness of using existing values beyond the age range for which they were measured needs to be evaluated in the local context.

FIGURE
FIGUREExample choice set health state A and B.

FIGURE
FIGUREExample choice set health state B and C.

FIGURE
FIGURE (A) Utility decrements per EQ-D-Y dimension severity level in the Netherlands.(B) Utility decrements per EQ-D-Y dimension severity level in China.

FIGURE(
FIGURE (A) Distribution of likert responses by scenario in the Netherlands.(B) Distribution of likert responses by scenario in China.
TABLE Characteristics study samples.

TABLE A
Results non-linear preferences on a latent scale Dutch population.Indicate that this is an interaction between the domain like mobility and years as described under the heading results discrete choice experiment.

TABLE B
Results non-linear preferences on a latent scale Chinese population.