Population norms for the EQ-5D-3L in China derived from the 2013 National Health Services Survey

Background EQ-5D-3L is one of the most commonly used instruments for assessing health-related quality of life and cost-utility analyses, but it is not yet available in China. This study aims to develop population norms for the EQ-5D-3L in China in order to encourage appropriate use and interpretation of the EQ-5D-3L instrument. Methods Data were extracted from the 2013 National Health Services Survey on a nationally representative sample of 188 720 participants. The utility index based on the 2018 Chinese preference-based value sets were calculated for the participants with different demographic and socio-economic characteristics. Differences in reported problems and visual analogue scale (VAS) and utility index scores were tested using a logistic, linear and tobit regression model, respectively. Results The Chinese respondents were less likely to report problems on the EQ-5D dimensions compared with most populations in other countries. Pain/discomfort was the most commonly reported problem (12.6%). This resulted in a high ceiling effect (84.19%) on the utility index and high mean scores for the utility index (0.985 ± 0.056) and VAS (80.91 ± 13.74) in the Chinese population. Those who were younger, better educated, employed, married, had no illness condition, lived in a more developed region and had a higher income obtained higher scores in both VAS and utility index. The VAS and utility index scores were also associated with gender, residency and lifestyles, but not always in a consistent way. Male and rural residents had a higher VAS score but not in the utility index compared with their female and urban counterparts. Conclusions This study provides national population norms for the EQ-5D-3L based on the 2018 Chinese preference-based value sets. The norms can be used as a reference for health evaluation studies. Cautions need to be taken for presenting and interpreting the utility index results given the high ceiling effect of the EQ-5D-3L instrument.

and comparisons [5,6], which is particularly important when there is no control group. Population norms can help define what is deemed normal or abnormal in different cultures. It is important to note that population norms may also change over time [7][8][9]. For HRQoL, this could be an indication of changes in population values on health or their health status or both. Therefore, the interpretation of HRQoL results has to be anchored in the historical and cultural contexts of the assessed population.
The EQ-5D, developed by the EuroQol Group in the 1990s, is one of the most commonly used instruments for assessing HRQoL. It is a simple and psychometrically sound instrument available in more than 170 languages [10]. The EQ-5D has three versions: EQ-5D-3L, EQ-5D-5L, and EQ-5D-Y. The former two were designed for adult populations and the latter one for children and adolescents aged 7 to 12 years. The EQ-5D instrument consists of a descriptive system and a self-report visual analogue scale (VAS). The descriptive system contains five items measuring the dimension of mobility, self-care, usual activities, pain/discomfort and anxiety/depression, respectively. Each dimension is assessed using either a three-point (no, moderate, severe) scale (EQ-5D-3L) or a five-point (no, slight, moderate, severe, extreme) scale (EQ-5D-5L). The combination of health states in relation to the five dimensions can be converted into a single summary index value (also known as utility index) in line with the health preference from the general population: a utility index 0 indicates death while 1 indicates full health. The VAS records self-rated general health on a vertical visual analogue scale ranging from 'Worst imaginable health state' (0) to 'Best imaginable health state' (100) [11,12]. The utility index of individuals or subpopulations needs to be examined against their population norms. More than 30 countries have subsequently established national population norms for the EQ-5D [3,4,6,7,[13][14][15][16][17][18][19][20][21][22][23][24][25][26]. These norms have been successfully used in health economic evaluation and other patient-reported outcome-based studies [24].
The EuroQol Group provided population norms for the EQ-5D-3L utility index in China based on a small sample size (n = 8031) using a scoring algorithm derived from the European preference-based value sets [27,28]. However, some researchers expressed concerns about reporting the EQ-5D-3L results based on value sets derived from other countries [29]. For example, Clemens and colleagues noted the differences in the EQ-5D utility index based on the value sets derived from the preference of the local Australian population in comparison with those resulting from the UK and the US value sets [16]. Such differences may be even more profound given the greater cultural difference between China and the European countries. Indeed, Wu and colleagues identified significant differences in the utility index for some health states between the results based on the Chinese preference value sets and those derived from the UK and Japanese populations [30]. Sun and colleagues [31] reported problems in the five dimensions of the EQ-5D-3L without converting the results into utility index using the European value sets as recommended by Janssen and colleagues [27]. A large number of countries have established their own national value sets [32].
The EQ-5D-3L has been validated in a range of Chinese populations [33,34], including in the general public [31] and those with disease conditions such as hypertension [35], diabetes [36], cancer [37], and heart disease [29]. Since 2008, the EQ-5D-3L instrument has been included in the National Health Services Survey (NHSS) in China [38]. It has been recommended as a tool for conducting health technology assessment in China [29]. In 2014, Liu GG et al published the EQ-5D-3L value sets derived from a sample of urban Chinese populations [29], which prompted the development of local population norms at several provinces in China [35,39]. But Liu's value sets suffered from a serious bias toward big cities [40]. Significant urban-rural differences in public preferences on health exist even after controlling for variations in socioeconomic status [31]. In 2018, a new version of the value sets was made available by Zhuo L et al using a national representative sample [41]. This enabled us to establish the national population norms for the EQ-5D-3L in China, which are essential for the appropriate use and interpretation of the EQ-5D instrument.

Study design and data collection
A cross-sectional survey was conducted using the EQ-5D-3L in China. Data were obtained from the 5th National Health Services Survey undertaken in September 2013 [42]. The NHSS has been the largest national representative household survey organised by the Centre for Health Statistics and Information of the Ministry of Health in China every five years since 1993. The NHHS followed a robust design and strict protocol. Data collected in the NHSS have been widely used in health services research and policy decision making [43][44][45][46]. The 5th NHSS adopted a four-stage stratified cluster random sampling strategy in selecting 93 600 households from 1560 communities/villages, covering 780 sub-districts/townships in 156 cities/counties representing the 31 provinces in mainland China. Details on the sampling procedure used in the NHSS have been reported elsewhere [43,45].
Each household member was interviewed face-to-face separately by a trained local medical worker, tapping into the demographic and socioeconomic characteristics, lifestyle and behaviors, self-reported health, and the use of health care services of the respondents. The quality of the returned questionnaire data was checked by the survey supervisors through a repeated survey on 5% of the participating households, which resulted in a 97.7% consistency. The sample was proven to be representative of the national population without age bias as indicated by the Myer's index (2.55), DELTA dissimilarity coefficient (0.085) and GINI concentration ratio (0.0525) [42].
The EQ-5D-3L applied to those who were 15 years and older in line with other recent studies [15, 35,47]. A total of 273 688 respondents completed the questionnaire, including 230 064 who were eligible for the EQ-5D-3L survey. In this study, we followed the Europe EQ-5D-3L User Guide [48] and excluded the returned questionnaires completed by a proxy respondent (a maximum of 30% of proxy responses was allowed in the NNSS for the household members who were absent at the time of the survey). We further excluded a very small number (196) of returned questionnaires containing missing values in the EQ-5D-3L. This resulted in a final sample size of 188 720 (82% of eligible participants) for data analyses (Figure 1).

Measurements
We presented population norms for the EQ-5D-3L by gender and age. Given the large regional disparities in social and health development, the populations were further divided into eastern developed, western under-developed, and central regions in between, as well as urban vs rural.
The population norms were described using three indicators: percentage of reported problems on the five dimensions of the EQ-5D-3L, utility index, and VAS scores. The combinations of reported problems generated 243 possible health states. Each health state was assigned a value (utility index, ranging from 0.170 to 1.000) according to the nationally representative Chinese value sets developed by Zhuo and colleagues [41]. The Chinese value sets were derived from public preferences using the time-trade-off (TTO) technique as recommended by the EuroQol Group [41]. The VAS score is an indication of overall health perceived by an individual. The respondents were asked to rate their health along a scale ranging from 0 (worst health) to 100 (full health).
Factors associated with the EQ-5D-3L results were identified in line with the social determinants of health model proposed by the World Health Organisation [49]. Previous studies suggest that apart from age and gender, residency, education, employment, marital status, household income, illness conditions, lifestyle and behaviors are also significant predictors of HRQoL in China [31]. In this study, household income was categorised into quintiles according to the income distributions of the local cities or counties. Illness conditions were captured by reported acute conditions over the two weeks prior to the survey (yes or no), chronic conditions diagnosed by a doctor during the previous six months (yes or no), and episodes of hospital care over the previous 12 months (yes or no). Lifestyle and behaviors were measured by the current status of smoking (yes or no), drinking (yes or no during the last 12 months), and weekly physical exercise (yes or no over the last 6 months). Further details about the definitions of these variables can be found in the NHSS guidelines [42].

Statistical analysis
We calculated the percentage of respondents reporting problems on each of the five dimensions of the EQ-5D-3L, as well as the percentage of respondents who reported problems in any dimension. Population differences in relation to the reported problems were tested using Pearson χ 2 tests. Multivariate binary logistic regression models (with or without problems) were performed to identify the factors associated with the reported problems.
Means with standard deviations (SD) and medians with interquartile ranges (IQR) of the utility index and VAS scores of the EQ-5D-3L were calculated in line with those of previous studies [4,15,24,27,47]. Population differences in the utility index and VAS scores were tested using student t tests or analysis of variance (ANOVA), and Wilcoxon or Kruskal-Wallis rank sum tests. Multivariate linear regression models and Tobit regression models were performed to identify the factors associated with the VAS and the utility index scores, respectively. The Tobit approach was recommended by Zhang for censored or bounded data [35,50].   The significance level of the statistical analyses was set at 0.05. All statistical analyses were performed using STATA version 14.0 (SE) for Windows (StataCorp LLC, College Station, TX, USA). An enter approach was adopted in the regression modelling, with all of the independent variables being coded as categorical variables and compared with a reference group. Given that the statistical significance can simply be a function of large sample size, we also used the Cohen effect size (average difference in the score divided by the standard deviation of the score in the group for comparison) [51,52] to judge the significance of the differences of the utility index and VAS scores. According to Cohen, a size below 0.2 indicates a small effect, while 0.5 and 0.8 indicate a medium and a large effect size, respectively. A medium effect size (0.5) is usually considered as a difference with clinical meaning [52].

Ethics
The present study is a secondary analysis of the NHSS 2013 data. The NHSS obtained ethics approval from the institutional review board of the Chinese National Bureau of Statistics (license number 2013-65). Informed consent was obtained from all the respondents prior to the survey. All procedures performed in the study were in accordance with the ethical standards of the Chinese National Bureau of Statistics and with the 1975 Helsinki declaration.

Characteristics of respondents
Slightly more than half of the respondents (52.4%) were women. The distribution of respondents in terms of gender, region, education, employment and marital status resembled those of the national population structure [42]. However, this study sample contained a higher proportion of respondents aged 65 years and older (17.84%) compared with the national average 11.6% [53] ( Table 1).

Percentage of reported health problems
Pain/discomfort was the most frequently reported problem (12.6%), followed by problems in mobility (5.9%) and anxiety/depression (5.3%). The least reported problem was in self-care (3.1%) ( Table 2). Women were more likely to report problems than men in relation to pain/discomfort and anxiety/depression, but less in other dimensions of the EQ-5D. The proportion of respondents reporting problems increased with age. Socioeconomic gradients were evident, and those with a higher socio-economic status (better educated, higher income and employed) were less likely to report problems. Rural residents and those residing in the less developed central and western regions were more likely to report problems on all of the five dimensions than others. Those who did not smoke or drink reported more problems than those who did ( Table 3).
Similarly, the distribution of the VAS score was also negatively skewed (Figure 3). More than 40% of respondents reported a higher than 90 VAS score.  correlated with the utility index scores (r = 0.4537, P < 0.05) in the total sample (Figure 4), as well as in the subsamples stratified by gender and age ( Figure S1 to S9 in the Online Supplementary Document).
Men and rural residents had higher VAS scores (but not in the utility index) than women and urban residents. Younger respondents had higher VAS and utility index scores than their older counterparts. Socioeconomic gradients were evident, and those with a higher socio-economic status (eg, better educated, higher income, employed and living in developed areas) had higher utility index and VAS scores. Those who did not smoke or drink had a lower utility index, but not in VAS scores ( Table 1 and Table 4). In addition, the effect size of differences in age, region, education attainment, income level, employment status, marital status, two-week morbidity, chronic disease, hospitalisation, and exercising on the utility index exceeded the medium effect size of 0.5 ( Table 4). The median and IQR results also supported these findings (Table  S2 in        β -standardised beta coefficient in the regression models, SE -standard error, 95% CI -95% confidence interval In the supplementary file (Table S3 and Table S5 in the Online Supplementary Document), we presented the means and standard deviations of the VAS and utility index scores, respectively, by gender, age, region and residency. The respondents living in the eastern developed region had the highest VAS and utility index scores in all age groups. In contrast, those living in the western under-developed region had the lowest VAS and utility index scores. The urban-rural disparities appeared to vary by regions. Urban residents residing in the central region had a higher utility index score for all age groups than their rural counterparts. But in the eastern and western regions, urban residents aged between 15 and 54 years had a lower utility index than their rural counterparts (Table S5 in the Online Supplementary Document). Similar results can be found in the medians and IQRs of the VAS and utility index scores presented in the supplementary file (Table S4 and Table S6 in the Online Supplementary Document).

DISCUSSION
To the best of our knowledge, this is the first paper to provide Chinese EQ-5D-3L population norms based on a large sample using the nationally representative preference-based value sets. The population norms, presented in the percentage of reported problems, means (standard deviations), and medians (interquartile ranges) of VAS and utility index scores, can serve as reference for comparative purposes in HRQoL studies and health economic evaluation studies in China.
Overall, the Chinese people have relatively higher utility index scores compared with those from other countries [4,7,11,16,18,19,24,27], although the mean value (0.985) is a bit closer to those in Singapore (0.950) [25] and Korea (0.958) [54]. The European value sets would bring the mean utility index score down to 0.951, which is still high compared with other populations [27,55]. Previous studies using the value sets derived from a small sample of urban populations in big cities may have also underestimated the HRQoL of the Chinese populations (Table S7 in the Online Supplementary Document).
Ceiling effects are profound in the Chinese populations (especially in the young groups), with 84.2% reporting no health problems compared with 41.3% in Portugal [19], 47.1% in Poland, 62.4% in Spain [47], 68.0% in Japan [4], and 79.0% in Singapore [25]. Similar findings were also reported in previous studies in China [31,39,44]. Overall, ceiling effects of the EQ-5D-3L instrument were relatively high in Asian countries [4,25,27,31,39,44]. However, the percentage of reported problems in China is not always the lowest in comparison with other countries. Some American/Western European countries (such as Denmark, Germany, Sweden and Switzerland) reported even lower levels of problems in some dimensions (  [56,57]. As a result, the use and interpretation of the EQ-5D-3L utility index needs to be cautious. Empirical evidence shows that Asian populations are less likely to report health problems than their European counterparts, inflating the utility index [27]. It is important to note that the percentage of respondents in this study reporting problems on the five dimensions of the EQ-5D-3L are consistently lower compared with the populations in other countries [3,4,7,11,13,[16][17][18][19]22,24,25,27,47,54,58,59]. Similar to studies undertaken in other countries, pain/discomfort was the most frequently reported problem [11,60]. But only about 12% of the Chinese respondents reported problems in pain/discomfort, much less than those from other countries, which could be as high as 65.0% [3,4,7,11,13,[16][17][18][19]22,24,25,27,47,54,58,59]. The next frequently reported problems in the Chinese populations were mobility (5.9%) and anxiety/depression (5.3%), again at a level lower than other countries [3,4,7,11,13,16,18,19,22,24,27,47,54,58,59]. The only exceptions are the lower level of reported problems in mobility (3.6%) in the Singapore population [25] and the lower level of reported problems in anxiety/depression in the Netherlands (3.5%) and German (4.3%) populations [11,27].  [17], but higher than those in some other countries (ranging from 71.1 to 80.0) [3,11,19,24]. Further analyses demonstrated a wide distribution of VAS scores in those who reported no health problems, indicating a low discriminatory power of the utility index (Table S7 in the Online Supplementary Document).
There exists a gender gap in HRQoL of the Chinese populations. Overall, male respondents were less likely to report health problems than female respondents, resulting in higher VAS and utility index scores. However, after controlling for variations in other factors, the effect size in the differences in gender on the utility index was only 0.05. This phenomenon was also observed in studies in Singapore [25] and other studies in China [31,35,39,61]. Szende and colleagues believe that gender plays a small role in explaining HRQoL [11]. It is worth noting that gender variations in utility index and VAS scores were not always consistent. Women had lower scores in VAS than men, but not in utility index. These inconsistencies may be due to the conceptual differences in the two measurements [62]. VAS reflects a direct individual real-time rating considering all aspects of health; whereas, utility index is an indirect measurement, using past time value sets to estimate current states considering limited dimensions of health [38]. Women may have a relatively higher expectation on health, resulting in lower ratings on VAS [31,44]. Similar to other studies [4,6,7,11,13,16,19,22,24,31,39,60,63], we found that older age is associated with lower HRQoL (with an effect size of 0.91-4.20 on the utility index). China is currently experiencing unprecedented rapid transition to an ageing society as a result of the decades long family planning policy [64], which could lead to an overall decline in HRQoL of the entire population.
Regional and residential disparities in HRQoL in the Chinese populations are evident. Those who resided in the eastern developed region had higher HRQoL than their central and western counterparts, with a medium effect size (0.34-0.55) on the utility index. Rural respondents in this study had higher VAS ratings than their urban counterparts although they were more likely to report problems in mobility, self-care, and usual activities. Unlike the utility index measurement applying past time value sets, VAS scores reflect real time individual subjective ratings. Rural residents may have relatively lower expectation on health, resulting in higher ratings on VAS [31,44]. According to the Guide Book of EQ-5D-3L, both utility index and VAS scores should be presented in result reporting. Young rural residents (≤55 years) had higher utility index scores than their urban counterparts, especially in the eastern and western regions. By contrast, old rural residents (≥65 years) had lower utility index scores than their urban counterparts. These results are consistent with findings reported in previous studies in China and elsewhere [31,35,39,44,61,63]. However, the urban-rural differences revealed in this study is small based on effect size (0.03-0.05). Nevertheless, it is important to note that population mobility is high in China: young people in rural and undeveloped regions are increasingly moving to urban and more developed regions [64,65]. This could exacerbate the aging process in rural areas, lowering the utility index of rural residents.
Socioeconomic gradients in HRQoL as measured by the three indicators (percentage of reported problems, utility index and VAS scores) deserve increased attention. We found that those with a higher socio-economic status (eg, richer, better educated, employed and married) have significantly higher HRQoL than others, after controlling for variations in demographic and health characteristics. These results are consistent with findings from previous studies in China and elsewhere [4,6,7,11,13,16,22,25,31,39,60,63]. The effect size on the utility index reached 0.35-0.76 for education, 0.37-0.74 for income, 0.29-1.11 for employment, and 0.02-0.75 for marital status.
This study adds additional evidence to support the proposed association between physical activity and HRQoL [4,6,7,11,13,16,22,31,39,60,63]. The effect size of regular exercise (0.66) on the utility index shows clinical significance. However, the associations between HRQoL and smoking and drinking are small, albeit statistically significant, failing to reach a clinical meaningful level according to the effect size (0.16 for smoking and 0.13 for drinking). Positive associations between HRQoL and smoking and drinking were reported in previous studies conducted elsewhere in China [66][67][68].
Limitations: There are several limitations in this study. First, this is a cross-sectional study and no causal relationships should be assumed for the findings. Second, there is a high ceiling effect for the EQ-5D-3L utility index. Although the EQ-5D-5L descriptive system may improve the discriminatory power, its ceiling effect is likely to stay high in the Chinese populations [69,70]. Further studies should examine the cultural responsiveness of the EQ-5D-3L instrument [56] and the appropriateness of the translated wording and phrasing (language) of its descriptive system [71] in the Chinese context. We reported mean values and standard deviations of the EQ-5D utility index in line with other studies despite its high ceiling effect [4,15,24,27,47]. Non-normal distributions of EQ-5D utility index scores are common in all EQ-5D studies: full health status was reported in 79% Singaporean populations, 68% in Japanese populations, and more than 50% of populations in Poland, France and Spain [4,18,24,27,47,60]. Third, the EQ-5D instruments measure limited dimensions of health due to a small number (five) of items. The small item number made it easier to generate a utility index score, which is often absent from a more comprehensive instrument (for example the SF-36) [38,72,73]. However, the implications of the utility index need to be interpreted with caution, and in conjunction with analyses of the problems reported and VAS scores.

CONCLUSIONS
This paper provides population norms for the EQ-5D-3L in China stratified by age, gender and region based on the 2018 population preference-based value sets derived from a national representative sample. The norms can be used as a reference for health economic evaluation studies. Overall, 15.8% of respondents reported a health problem, with pain/discomfort being the most commonly reported problem. Compared with other populations, the Chinese people have high scores in VAS and utility index. Those who are richer, better educated, employed, married, and live in developed areas have higher scores in both VAS and utility index than others. Further studies are needed to explore the underlying reasons of the sociodemographic differences.
Given the high ceiling effect and low discriminatory power of the utility index, cautions should be taken in presenting and interpreting mean values of the utility index. The meaning and implications of the EQ-5D-3L utility index need to be interpreted in conjunction with other indicators, including the nature and number of problems reported, the distributional position of the health state, and the VAS scores. However, we acknowledge that the ceiling effect of the utility index is less serious in some subpopulations, such as the elderly and those with existing illness conditions, as revealed in this study. This suggests that it may be more appropriate to use the EQ-5D-3L instrument in those subpopulations rather than the entire general population. We advocate use of the percentile indicators for presenting population-based results of the EQ-5D-3L utility index and further studies into the health state descriptive system tailored to the specific context of the Chinese populations.