A Systematic Review of Studies Eliciting Willingness-to-Pay per Quality-Adjusted Life Year: Does It Justify CE Threshold?

Background A number of studies have been conducted to estimate willingness to pay (WTP) per quality-adjusted life years (QALY) in patients or general population for various diseases. However, there has not been any systematic review summarizing the relationship between WTP per QALY and cost-effectiveness (CE) threshold based on World Health Organization (WHO) recommendation. Objective To systematically review willingness-to-pay per quality-adjusted-life-year (WTP per QALY) literature, to compare WTP per QALY with Cost-effectiveness (CE) threshold recommended by WHO, and to determine potential influencing factors. Methods We searched MEDLINE, EMBASE, Psyinfo, Cumulative Index to Nursing and Allied Health Literature (CINAHL), Center of Research Dissemination (CRD), and EconLit from inception through 15 July 2014. To be included, studies have to estimate WTP per QALY in health-related issues using stated preference method. Two investigators independently reviewed each abstract, completed full-text reviews, and extracted information for included studies. We compared WTP per QALY to GDP per capita, analyzed, and summarized potential influencing factors. Results Out of 3,914 articles founded, 14 studies were included. Most studies (92.85%) used contingent valuation method, while only one study used discrete choice experiments. Sample size varied from 104 to 21,896 persons. The ratio between WTP per QALY and GDP per capita varied widely from 0.05 to 5.40, depending on scenario outcomes (e.g., whether it extended/saved life or improved quality of life), severity of hypothetical scenarios, duration of scenario, and source of funding. The average ratio of WTP per QALY and GDP per capita for extending life or saving life (2.03) was significantly higher than the average for improving quality of life (0.59) with the mean difference of 1.43 (95% CI, 1.81 to 1.06). Conclusion This systematic review provides an overview summary of all studies estimating WTP per QALY studies. The variation of ratio of WTP per QALY and GDP per capita depended on several factors may prompt discussions on the CE threshold policy. Our research work provides a foundation for defining future direction of decision criteria for an evidence-informed decision making system.


Introduction
It is widely known that using cost-effectiveness (CE) threshold as a cut-off for deciding whether an intervention is cost-effective is not uncommon [1]. Despite a controversy whether the threshold should be set, the CE threshold has been used implicitly or stated explicitly in various countries [2][3][4][5][6]. After a number of years of using health technology assessment (HTA) as part of decision makings, the CE thresholds in some countries, e.g. UK and Australia, become more apparent [5,7,8]. Several methods, such as expert opinion, human capital, WTP, and WHO recommendation, were used to estimate WTP per quality-adjusted life year (QALY) values [2,[9][10][11]. However, how to derive appropriate cut-offs is still inconclusive.
CE threshold is defined as the maximum value of money per health outcome that a jurisdiction decides to pay for adopting a technology or an intervention [1]. Various jurisdictions refer to World Health Organization(WHO) recommendation for their CE thresholds, which were based on one to three times the gross domestic product (GDP) per capita per disability-adjusted life years (DALYs) as a cut-off [10,11]. However, in the practice, the CE threshold unit was usually cost per QALY and most of CE threshold studies were based on QALY [12][13][14]. WTP per QALY, which stems from the maximum amount ones would be willing to pay in order to gain an additional QALY, is another economic concept that has been used to justify CE thresholds [9,[15][16][17][18]. A number of researchers have conducted WTP per QALY studies to understand how patients or general population valued one QALY gained in various diseases [9,[18][19][20][21][22][23][24][25][26][27]. However, there has not been any evidence revealing linkage or comparison between WTP per QALY and CE threshold based on WHO recommendation in any country. Having these linkages or comparisons would not only reflect decision makers' justification of technology or intervention adoption but also help all stakeholders, including the pharmaceutical industry, to have a better understanding of decision criteria. In addition, although several previous stated preference studies revealed that WTP per QALY values varied depending on how scenarios were specified, the elicitation instrument used, and other factors [6,14,20,21,[24][25][26][27][28][29], there has not been a comprehensive summary of literatures on how these factors affect WTP per QALY values. The objectives of this study were therefore to systematically review WTP per QALY literatures and compare WTP per QALY with CE threshold recommended by WHO for each country. Also, potential factors influencing the ratios between WTP per QALY and CE threshold were examined.

Data sources and search strategy
Various databases including MEDLINE, EMBASE, Psyinfo, Center of Research Dissemination (CRD), Cumulative Index to Nursing and Allied Health Literature (CINAHL), and EconLit were systematically searched. They were searched from their inception until 15 July 2014. Medical Subject Headings (MeSH) and keywords used for the search included 1) (willingness to pay or contingent valuation or discrete choice experiment) AND (quality adjusted life year or QALY), OR 2) willingness to pay for (per) quality adjusted life year. There were no language restrictions.

Study selection and Data extraction
Studies were included if they met the following criteria: 1) an original article eliciting WTP per QALY, 2) using stated preference method, and 3) estimating WTP per QALY in health-related issues. Two investigators (K.N. and K.V.) independently reviewed each abstract, completed full-text reviews, and extracted information from each study for inclusion in study analysis. Data extracted from each study were year of publication, year of study, country, number of country per study, characteristics of hypothetical scenarios, number of scenarios per study, sample size, sampling method, mode of administration, interviewer, WTP elicitation method (WEM), number of WEM per study, utility elicitation method (UEM), number of UEM per study, types of respondents, respondents' income, and WTP per QALY values. Since many studies were conducted in a number of countries and/or scenarios, it was possible that more than one WTP per QALY value was obtained from each of them. [24]. During data extraction, the trimmed median of WTP per QALY value was preferred to median, and the trimmed mean was preferred to mean. Since cost-derived data are generally skewed, median is preferred. The trimmed analysis value was selected because the outliers were excluded from the analysis [30].

Data analysis
A descriptive analysis was conducted. All values were converted to US dollar units ($) in the year of study based on exchange rates from World Bank [31]. The WTP per QALY value was compared to GPD per capita, which was obtained from World Bank [32] for the year and country of study. As a result, the ratio of WTP per QALY value compared to GDP per capita was calculated. A number of factors were hypothesized to have affected the relationship between WTP per QALY and GDP per capita [23][24][25][26]. They included outcomes, perspectives, severity of hypothetical scenarios, utility elicitation method, duration of scenario, type of respondent, type of country income, and funding sources [20,[23][24][25][26][27]. These relationships were tested by independent sample t-tests or ANOVA. All analyses were performed using SPSS 18.0 for Windows, (Chicago, Ill).
Respondents were asked to state their willingness to pay for treatments that could either improve quality of life or extend life or save life. Most studies (12/14, 85.71%) asked respondents to value QALY in the perspective of improving quality of life [19][20][21][22][23][24][26][27][28][29]33,34], while only one study (7.14%) asked respondents for the value of extending life [6] and another study asked for all improving quality of life, extending life, and saving life [25]. In addition, only one study had respondents elicit WTP for disease prevention [26].

Comparisons between WTP per QALY and GDP per capita
All WTP per QALY values and their comparisons with GDP per capita of each study's country are shown in Table 2. A total of 167 WTP per QALY values were obtained from the 14 studies. Based on varying scenarios, utility and WTP elicitation methods, and perspectives, these values varied extensively. Overall, these studies reported that WTP per QALY values fell between $2,019 [21] and $282,821 [20]. The mean (SD) and median of WTP per QALY values were $34,309 ($55,390), $9921, respectively. When WTP values were compared to GDP per capita of each country for specific study years, the ratios of WTP per QALY and the country's GDP per capita ranged from 0.05 [21] to 5.40 [20]. The mean (SD) and median values of these ratios were 0.77 (0.89) and 0.43, respectively. Interestingly, among 167 observed values of WTP per QALY, more than three quarters of the number of these values (127/147, 86.39%) were below one GDP per capita for an additional QALY.
Associations between factors and the ratios of WTP per QALY and GDP per capita Table 3 shows the results of the relationship between various factors and the ratio of WTP per QALY compared to GDP per capita. The average ratio of WTP per QALY and GDP per capita for extending life or saving life (2.03) was significantly higher than the average for improving quality of life (0.59) with the mean difference of 1.43 (95% CI, 1.81 to 1.06). It was also found that, on average, the estimates from a societal perspective (2.16) were clearly higher than those from an individual perspective (0.63) (p-value <0.01). A linear trend of the ratios of WTP per QALY and GDP per capita was proportional to the increasing severity of conditions (p-value <0.01). WTP per QALY and GDP per capita derived from indirect utility elicitation method (1.45) was significantly higher than direct utility elicitation method (0.45) (p-value <0.01). The ratio of WTP per QALY and GDP per capita from studies in LMIC (0.97) was insignificantly higher than that in non-LMIC (0.75) (p-value = 0.35).
Duration of scenario was significantly associated with the ratio between WTP per QALY and GDP per capita (p-value<0.01). The shorter duration (1 month to 1 year) scenario seemed to have higher WTP per QALY (p-value<0.01). Interestingly, the studies funded by drug companies reported that the ratio of WTP per QALY and GPD per capita (1.62) was higher than the ratio (0.80) in the other studies that were not funded by any drug company (p-value< 0.01). We also found that the sample size has a statistically significant negative association with the ratio of WTP per QALY and GPD per capita (0.30 per increment of 1,000 subjects).

Discussions
To our best of knowledge, this is the first study systematically reviewing literatures on WTP per QALY that determined whether evidences justified the CE threshold recommended by WHO. The review provided the summary of methods that could be used for future improvement in this kind of study. In addition, it shed light on the relationship between WTP per QALY and GDP per capita. The comparison between results from research on WTP per QALY and CE threshold would be valuable to policy makers because they could use this evidence to support direction of future decisions. Even though this study did not reveal how WTP per QALY differed from the current thresholds used in their jurisdictions, policy makers could refer these numbers with their 'implicit' thresholds eventually. There has been an increasing trend in the number of studies for WTP per QALY in the last decade. However, only 14 studies were included. The main reasons were that several studies were literature reviews [9,15,[35][36][37][38][39] or did not report WTP per QALY values [40][41][42][43][44][45][46] or were not related to health issues [47,48]. Half of reviewed studies were conducted in European countries. A reason could be that many countries there adopted HTA for decision making and they have had a strong network, e.g. EUnetHTA, to conduct this type of study. In terms of methods used in these studies, this review shed light on various parts of study design, including samples, method of administration, scenarios, etc. For study samples, most studies used general population. Those researchers might perceive that health care as a public goods and believed that focusing on individual patients or diseases would not reflect the complete picture of society. This is not meant to say that WTP per QALY for particular diseases were not useful since in fact it could be used for other purposes. For instance, it could be used for bridging cost-effectiveness/ cost-utility analysis with cost-benefit analysis, which has stronger theoretical ground, in particular diseases [16,49]. In other words, selected samples should depend on study objectives or  applications. However, when these WTP per QALY values were compared with GDP per capita, their ratios were not significantly different. Certainly, it could not be generalized but it provided less concern for future use of research results from different types of samples. The results showed that these samples were asked to use either individual or societal perspectives when they responded to the questions. Most studies asked them to use their own perspectives since it might be easier for them to imagine from given scenarios and their responses should be more valid. The ratio between WTP per QALY and GDP per capita from two different perspectives were significantly different. This systematic review did not intend to determine which perspective would be better than another, but it suggested that perspective used in the study affected WTP per QALY values. Our findings provided scientific evidences for the controversy of the use of a fixed CE threshold versus flexible CE thresholds [7,18,25]. For example, US used a fixed CE threshold at $US 50,000 [2], while the Netherlands applied different CE thresholds for interventions that aimed for life threatening conditions and for other conditions [50,51]. The ratios between WTP per QALY and GDP per capita varied substantially especially those for extending life or saving life and improving quality of life and the ratios were higher among those scenarios with severe conditions compared to mild conditions. These implied that perhaps a fixed CE threshold might not be appropriate or one CE threshold might not fit all circumstances. In addition, the results showed that the ratios between WTP per QALY and GDP per capita were 0.59 and 2.03 for improving quality of life and extending life or saving life, respectively. It is also important to note that all evidences on this difference were driven by studies conducted in non-LMIC since there was no study conducted in LMIC to determine WTP per QALY for extending life or saving life and improving quality of life. This is suggestive of the need for such a study to look at this aspect in LMIC. Currently, the interest in using different CE thresholds has been adopted in some countries [50,51].
The review showed the WTP elicitation method primarily was contingent valuation (CV). However, CV itself was composed of several types of methods. Among them, the bidding game was used slightly more frequently than others. One reason could be that the bidding game was similar to or based on the concept of standard gamble, which has been well known among health economists. However, CV methods have been criticized for various weaknesses [52]. For instance, respondents were asked to consider whole health states or scenarios in CV and decide how much they would like to pay. In reality, they might consider these health states or scenarios based on only some important attributes that were important to them. Among the reviewed studies, only one selected study used discrete choice experiments (DCE). A potential reason was that DCE might have just been introduced to the field of health economics [53][54][55]. However, DCE is based on a rigorous theory, random utility theory, which has recently been proven to measure utility well. Potentially, DCE could be used more for future research of WTP per QALY.
Direct methods, e.g. SG, TTO, VAS, or their combinations, were used more frequently. Only four studies used indirect methods, e.g. EQ-5D and SF-36. There were at least two possible explanations. First, using indirect method required scale tariffs, which might not be available in those countries. In addition, when these indirect methods were used, they needed to be validated with study samples. Therefore, using an indirect method in this case might not be efficient or convenient, as an exchange for the validity of utility measurement. On the other hand, using direct methods would not only provide studies strong theoretical ground but also allow the study results to relate to other studies' results.
Another interesting result, that could stimulate controversy, was that the ratios between WTP per QALY and GDP per capita from different types of funding were significantly different. Those studies funded by drug companies tended to have higher ratio. This was by no means intended to reflect bias. Instead, it should be noted that the source of funding could have an impact on WTP per QALY, as compared to GDP per capita. In addition, the significant negative association of sample size and WTP per QALY compared to GDP per capita were worth noting. The higher ratios between WTP per QALY and GDP per capita in the small sample size might be due to the small study effect [56], which is a phenomenon of higher value of results among studies with smaller sample size. Subjects included in small studies might be selected in a way that is prone to give the higher value results.
There were several factors influencing with WTP per QALY values including severity of hypothetical scenario, outcomes, and duration of scenario. Therefore, we recommended these factors should be presented and clearly explained to respondents in future stated preference studies.
This review included only stated preference studies. Stated preference method is useful since it can derive WTP per QALY values for a number of specific scenarios. However, subjects may face difficulty imagining for scenarios. It could be challenging to have imagination without a full description of all relevant components of scenarios including hypothetical scenarios, severity, scenario outcomes, and duration. However, having specified a wide range of scenario, studies using stated preferences method could be derived to inform decision making on varying conditions in scenarios. This is contradictory to the use of the revealed preference method in which WTP per QALY can be derived from certain conditions or situations, providing fewer insights for supporting informed decision making.
Some researchers argued that the WTP per QALY estimated from stated preference method might not be relevant for policy making [16,49,57]. However, another researcher argued that countries needed to use a robust and simple method to look for WTP threshold for a QALY since it could be used to inform the political debate for the allocation of health care resources [6,[18][19][20][21][22][23][24][25][26][27][28]33,34,37,58,59]. A number of countries have estimated WTP per QALY using stated preference for support policy decision making [6,24].
Finally, there has been a debate on the usefulness and limitation of the use of WTP per QALY for policy decision making [9,49,57,59]. An important issue is the elicitation of preference based on whose perspective. There were two studies estimating WTP per QALY values from societal perspective [6,20], while most studies focused on individual's perspective. The issue of equity arose further among studies using individual perspective as the WTP values might be affected by individuals' income. Therefore, WTP per QALY values derived from individuals might cause distributional issue problem especially for low income person or unemployed person. Even though the elicitation of preference from individuals was consistent with welfare economic theory, the individual's valuation based on social perspective could provide information relevant for decision making under healthcare system [6,20,49,60].
A number of limitations should be acknowledged in this study. First, this study used the WHO recommendation as a reference CE threshold of each country since it was not feasible to identify their explicit CE. However, the WHO recommendation is recognized as the best available benchmark. Second, some studies used a number of scenarios and provided more than one WTP per QALY value, the average value might be weighted towards values from such studies more than those studies providing only one WTP per QALY value. Third, interaction effect of all factors was not controlled because of small sample size. It is noteworthy that mean difference may change if they had interaction.

Conclusions
This systematic review provides a summary of all studies estimating WTP per QALY in the existing literature. A description of the similarities and differences on how studies have been conducted provides a good foundation for defining good practice for this kind of study. The variation of ratio of WTP per QALY and GDP per capita depended on several factors may prompt discussions on the CE threshold policy. Our findings provides pivotal evidence to enable policy makers to discuss and initiate conversations among themselves and stakeholders on how decisions can be made and what criteria decisions should be based on, so that the improved overall population health can be achieved through an evidence-informed health care decision making system. Supporting Information S1 PRISMA Checklist. PRISMA Checklist. (DOC)