Living up to expectations: Experimental tests of subjective life expectancy as reference point in time trade-off and standard gamble

Earlier work suggested that subjective life expectancy (SLE) functions as reference point in time trade-off (TTO), but has not tested or modelled this explicitly. In this paper we construct a model based on prospect theory to investigate these predictions more thoroughly. We report the first experimental test of reference-dependence with respect to SLE for TTO and extend this approach to standard gamble (SG). In two experiments, subjects’ SLEs were used to construct different versions of 10-year TTO and SG tasks, with the gauge duration either described as occurring above or below life expectation. Our analyses suggest that both TTO and SG weights were affected by SLE as predicted by prospect theory with SLE as reference point. Subjects gave up fewer years in TTO and were less risk-tolerant in SG below SLE, implying that weights derived from these health state valuation methods for ime trade-off


Introduction
Time trade-off (TTO) and standard gamble (SG) are two popular methods to value health states, i.e. to obtain utility weights relevant for determining quality adjusted life-years (QALYs). 1 Although the methods share a similar purpose, their framing and outcomes differ substantially (Bleichrodt and Johannesson, 1997;Bleichrodt, 2002), with SG weights typically being higher than TTO weights ference between TTO and SG, for which some empirical support was found by Lipman et al. (2019b).
Prospect theory was originally developed as an alternative to expected utility (EU) theory for decision making under risk and uncertainty (Kahneman and Tversky, 1979;Tversky and Kahneman, 1992). Most importantly, prospect theory assumes reference-dependence, i.e. outcomes are not evaluated in final terms, but as changes relative to a reference point (RP). The RP is a neutral outcome, such as the status quo (i.e. current health), but many alternative comparators have been argued to be able to serve as RP, such as the lowest possible outcome (Attema et al., 2012;Bleichrodt et al., 2001), the guaranteed outcome in TTO or SG (van Osch and Stiggelbout, 2008;van Osch et al., 2004), or the best outcome available (van Osch et al., 2006). Furthermore, the RP may be influenced or formed by aspirations, expectations, norms, and social comparisons (Tversky and Kahneman, 1991). It is, however, paramount to determine the exact location of the RP, as this 'neutral outcome' will divide all other outcomes into gains and losses. Within prospect theory, this is especially relevant, as it assumes loss aversion, i.e. losses (relative to the RP) carry more weight than gains of the same size. Furthermore, in prospect theory probabilities can be transformed non-linearly by means of a probability weighting function, which may also differ between gains and losses.
Often it remains unclear exactly how RPs are selected, and how RP selection should be modelled within prospect theory (Wakker, 2010). Two different streams of literature have produced insights on the role of RPs in health-related decision making, which we try to unify in this paper. First, in applications of prospect theory to health outcomes, typically some plausible assumption is made about which outcome could serve as RP. This approach, where typically the RP is selected from the outcomes available within the scenarios presented to respondents, allows for tractable modelling and the formation of empirical predictions based on these assumptions. For example, earlier work on RP location for TTO and SG has suggested that the certain outcome, i.e. the impaired health state, will likely serve as RP in these health state valuation exercises (van Osch et al., 2006). Using such an approach to RP selection, prospect theory has been successfully applied in the health domain, for example, earlier work showed that the main tenets of prospect theory (e.g. loss aversion and probability weighting) apply to decisions about human lives (Kemel and Paraschiv, 2018), length of life (Attema et al., 2013;Lipman et al., 2019b;Verhoef et al., 1994;Treadwell and Lenert, 1999;Lipman et al., 2018), and quality of life (Attema et al., 2016). Second, a literature exists suggesting that RPs that originate outside the specific decision task at hand may also be selected by respondents. Such studies typically observe some effect of these reference-outcomes on decision-making or well-being, and conjecture that this effect may be due to reference-dependence. Examples of such suggested reference-dependence are RPs based on expectations for length (van Nooten and Brouwer, 2004;Van Nooten et al., 2009) and quality of life Wouters et al., 2015), or social comparisons (Wouters, 2016).
In this paper, we focus on the effects of individuals' subjective life expectancy (SLE), i.e. self-reported anticipated length of life, which could serve as an RP as defined within prospect theory in health state valuations. It is well-known that many individuals expect to live longer than actuarial life expectancy Péntek et al., 2014;Rappange et al., 2016). Gauge durations in TTO typically do not coincide with these expectations; frequently, projected life span in TTO is considerably shorter. Although individuals may not be fully aware of this reduction during health state valuation (van Nooten et al., 2014), earlier work on SLE has consistently found that individuals with higher SLE gave up fewer years in TTO, and thus associated QALY weights were higher. We will refer to these changes in QALY weights for durations further away from expectations about length of life as the 'SLE effect'. This SLE effect was found for 10-year TTOs (van Nooten et al., 2009), for patients valuing their own health state (Heintz et al., 2013), and for TTOs using a lifetime time-horizon (van Nooten and Brouwer, 2004). Considering that in most cases life years traded off in TTO fall short of SLE, it may seem plausible to assume that these life years are perceived as being in the loss domain already, and thus given up reluctantly (van Nooten and Brouwer, 2004;Van Nooten et al., 2009). This earlier work postulated that the SLE effect may occur as a result of loss aversion, yielding unwillingness in TTO exercises to further reduce lifetime compared to individuals' SLE which serves as RP.
However, this explanation of the SLE effect has never been modelled adequately or tested directly, as earlier work on the SLE effect has relied on investigation of heterogeneity in SLE by means of an observational between-subjects approach, i.e. explaining differences in TTO responses by differences in SLE. Furthermore, if the SLE effect applies to TTO weights as a result of referencedependence, one could also expect such effects on SG, as this method may also be affected by loss aversion (Bleichrodt, 2002). This has not yet been tested to our knowledge. Therefore, in this paper we extend earlier work on SLE effects in health state valuation by: The remainder of this paper is structured as follows. In Section 2 we define our theoretical model based on prospect theory and in Section 3 we derive predictions. Section 4 (Study 1) and 5 (Study 2) report the experiments used to test these predictions using different versions of TTO and SG. Study 1 applied the experimental methodology with a convenience sample of students. The results of this study suggest that SLE indeed serves as RP for TTO and SG. In Study 2, the external validity of these findings is tested by recruiting a sample of individuals aged 60 years and older, largely confirming the results from Study 1. In the final sections we discuss these results and conclude.

Notation
TTO and SG are denoted as health profiles described as (Q, t), where Q represents health status and t denotes the age at which the profile ends (e.g. living in a wheelchair until age 85), with D and FH denoting the states Dead and Full Health, respectively. Subscripts (e.g. a, r, x, y) are used to indicate chronic health profiles faced by a decisionmaker with age t a , where duration is defined as T x = t x − t a . Importantly, t a can, but need not, be the decision maker's current age (it could be any t a > 0). Risky prospects are defined as (Q x , T x ) p (Q y , T y ), i.e. health profile (Q x , T x ) with probability p, and health profile (Q y , T y ) with probability 1 − p. Preference relations are defined as usual, i.e. they are weak-ordered (complete and transitive), and denoted by (strict preference), (weak preference), and (indifference).
The TTO method asks for a time equivalent in perfect health which yields indifference between T x years in health state Q and T y years in FH. The number of years in T y is varied until the respondent is indifferent between the two options, i.e. (Q x , T x )∼(FH, T y ). The SG method involves a choice between a number of years (T x ) in health state Q x for certain and a gamble with two outcomes, which are FH during the same time period (T x ) and D. Probability p is varied until the respondent is indifferent between the two alternatives, i.e. (Q x , T x )∼(FH, T x ) p (D). Typically, preferences in TTO and SG are modelled within the general QALY model (Miyamoto and Eraker, 1989), which assumes that chronic health profiles (Q x , T x ) can be evaluated by the utility function V (.) : with U(Q ) denoting utility of health status and L(T ) denoting the utility of T life years. Assuming L(T ) = T (i.e. the linear QALY model 2 ), with the common normalisation such that U(FH) = 1, TTO indifferences can be evaluated by: SG indifference, on the other hand, additionally assuming EU and V (D) = 0, can be evaluated by: Although Eq.
(2) and Eq. (3) are only valid under these strict assumptions (and more general derivations for TTO and SG are available, see for example : Lipman et al., 2019b), these equations are often used in large scale health state valuations (Versteegh et al., 2016;Brazier et al., 2002).

Reference-dependence model for SG and TTO
Reference points play no role within the frameworks of EU and the general QALY model. Thus, in order to test whether SLE serves as RP, we will supplement the generalized QALY model with prospect theory, following closely the model developed in Lipman et al. (2019b). This means that we assume that the general QALY model holds with the additional assumptions outlined below included.
We assume separate evaluations of gains and losses in life duration compared to an RP, denoted T r . This RP is an expected health profile, which is taken to last for T r years, starting from the age (t a ) of the decision maker until their SLE (t r ), i.e. T r = [t a , t r ] = t r − t a . Throughout we will denote durations of health profiles (Q x , T x ) as deviations with respect to this RP as follows: we will write For example, imagine a 50year old subject with SLE of living until 80. The health profile of living in a wheelchair until age 70 will be denoted as (living in wheelchair, 10 (for more examples, see Online Supplements). We restrict our prospect theory model to life duration, even though it has been suggested that reference-dependence may also exist for health status (Wouters et al., 2015;. However, both from a theoretical and from an empirical point of view such reference-dependence for Q is hard to approximate. That is, prospect theory is typically applied to single-attribute outcomes, such as money, while health profiles consist of both life duration and health status. Multi-attribute characterizations of prospect theory exist, but because health status is a qualitative measure, loss aversion is not theoretically meaningful for this attribute (Bleichrodt and Miyamoto, 2003).
As a solution, we apply an attribute-specific evaluation ) by making three modifications to the general QALY model, to allow testing for referencedependence with SLE as RP. First, we modify L(T ) in the general QALY model to L i (T * ), which is a standard ratio scale utility function, that can differ between gain outcomes (i.e. (Q x , T * x ) with T * x ≥ 0, i = +) and loss outcomes (i.e. (Q x , T * x ) with T * x < 0, i = −), and is strictly increasing and real-valued. Second, loss aversion is incorporated into our model by taking L − (T * ) = L i (T * ) for T * < 0, where denotes a loss aversion index, with > 1 [ = 1, < 1] indicating loss aversion [loss neutrality, gain seeking]. Third, we incorporate probability weighting, by evaluating probabilities in risky prospects by probability weighing functions w i , i = +, −, that assign a number to each probability, with w i (0) = 0 and w i (1) = 1. These probability weighting functions can be different for gains and losses.
We do not modify U(Q ) of the general QALY model, but we attempt to control for possible effects of referencedependence of health status by applying our model only to health profiles where health status is better than what is considered acceptable at the ages under consideration. If, as Wouters et al. (2015) suggested, such acceptability serves as RP for health status, this restriction to acceptable health states may avoid confounding effects as losses will only occur in terms of duration while health status will always be above expectation.
Thus, as in Lipman et al. (2019b), references over risky prospects with both gain and loss outcomes, i.e.
while preferences over risky prospects (Q x , T * x ) p (Q y , T * y ) for either gains or losses are evaluated by: where i = + [−] when T * x , T * y > [<] 0, i.e. both outcomes are gains or losses. Lipman et al. (2019b) show that when w i (p) = p, = 1, and no distinction is made between gains and losses (i.e. no reference-dependence), this model reduces to the general QALY model.

Predictions
In this paper we consider two versions of TTO and SG. Typically, TTO and SG involve 10-year durations that start at current age. Instead, in this paper, we let the 10year period in a reduced health state, which occurs in both TTO and SG, a) start at SLE, i.e. t a = t r or b) end at each individual's SLE, i.e. t a = t r − 10. If SLE functions as RP, for a) the gauge duration occurs completely above SLE and thus always involves considerations in the gain domain (because t a = t r gives T * x , T * y > 0). Similarly, for b) the gauge duration occurs completely below SLE and thus involves trade-offs in the loss domain (because t a = t r − 10 gives T * x , T * y ≤ 0). Therefore, we label versions with gauge durations completely above SLE as gain versions (i.e. TTO-gains and SG-gains), while those versions with life years occurring completely below SLE are labelled as loss versions (i.e. TTO-losses and SG-losses). To distinguish between the starting ages in these versions for gains and losses, we add superscripts g and l, i.e. t g a = t r and t l a = t r − 10. As a final notational convention, given that both versions have the same durations T x (10 years starting at different ages), for clarity, we will add superscripts to health status for health profiles (Q x , T * x ), such that (Q g x , T * x ) and (Q l x , T * x ) refer to profiles in gain (starting at t g a ) or loss versions (starting at t l a ), respectively. For example, consider a subject expecting to live until age 80 (t r = 80). She would receive gain versions with t g a = 80 and loss versions with t l a = 70. If SLE indeed serves as RP, this shift from t g a to t l a allows us to test the SLE effect, as it changes the perception of life years with respect to the RP.
In the remainder of this section, we will employ our theoretical model based on prospect theory with SLE as RP to derive predictions about the SLE effect on TTO and SG. We will obtain these predictions by illustrating the  implications of our prospect theory model as opposed to a reference case, in which linear QALYs and EU hold (i.e. Eq. (2) and Eq. (3) can be applied). For the sake of brevity and clarity, we focus on providing graphical illustrations of these predictions in Fig. 1 and Fig. 2. A complete and formal proof of these predictions can be found in Online Supplements.

SLE effects for TTO
For TTO, consider as reference case, a subject willing to give up 2 years with reduced health status (Q x ) to obtain full health for 8 more years in the gain version. Using our notation with SLE as RP, this yields the following indifference: (Q g x , 10)∼ (FH g ,8). That is, in the gain version the subject is indifferent between gaining 10 years beyond SLE in health state Q x and gaining 8 years in full health. We will derive predictions from our model as to what this indifference implies for the years given up in loss versions, i.e. predict T * y in (Q l x , 0)∼(FH l , T * y ). We first consider the reference case, with linear utility (i.e. L − (T * ) = L + (T * ) = T * ) and no loss aversion ( = 1), which yields the following indifferences: (Q g x , 10)∼(FH g , 8) and (Q l x , 0)∼(FH l , −2) for gain and loss versions, respectively (as each year has the same value). In ferences more generally. Initially for the reference case, we observe symmetric indifferences: (Q g x , T * x )∼(FH g , T * y ) and (Q l x , T * x )∼(FH l , T * y ). That is, shifting t g a to t l a , which in our experiments with 10 year durations gives T r = T * x for losses, does not affect preferences, as (T * x − T * y ) is equal between gains and losses. These indifferences indicate that in both scenarios each year given up in Q x (i.e. T * x − T * y ) exactly offsets an equal part of the value of the quality of life gained (U(FH) − U(Q x )). However, such a combination of indifferences does not take into account any discrepancies between gains and losses. In Fig. 1 we provide two illustrations of how the SLE effect for TTO responses due to: a) non-linear utility curvature, and b) loss aversion.
First, whereas TTO typically is derived assuming that utility of life duration is linear, i.e. L − (T * ) = L + (T * ) = T * , earlier work on prospect theory has shown that this assumption is likely to be invalid for health outcomes (e.g. Attema et al., 2013;Kemel and Paraschiv, 2018;Lipman et al., 2019bLipman et al., , 2018 and monetary outcomes (e.g. Abdellaoui, 2000;Abdellaoui et al., 2008Abdellaoui et al., , 2016Bruhin et al., 2010). Instead, in prospect theory utility for gains is typically concave, and utility for losses is convex -i.e. utility for life duration is S-shaped. This inflection point in the utility curve may affect years given up in TTOgains and TTO-losses versions, as it implies diminishing marginal sensitivity for additional life years gained or lost further away from T r , as opposed to the linearity assumed in the reference case. Hence, it becomes important to consider where the life years given up in Q x , and the years in which improved quality of life (U(FH) − U(Q x )) is realized, fall along this S-shaped curve (we illustrate these effects in Fig. 1). For TTO-gains, the years given up in Q x (e.g. between 8 and 10) are further away from T r than the years in which improved quality of life (U(FH) − U(Q x )) is realized (e.g. between 0 and 8). Given that utility for gains is concave, in contrast to the reference case where each year is valued equally, we should find that each year given up in Q x gets less weight than each year in which improved quality of life ( (U(FH) − U(Q x ))) is experienced. Compared to the linear reference-case, this yields a convex indifference curve, and the respondent will give up more life years to offset the improvement in quality of life ( (U(FH) − U(Q x ))) and restore indifference. Hence, we For TTO-losses, however, the years given up in Q x (e.g. between 0 and −2) occur closer to T r than the years in which (U(FH) − U(Q x )) is realized (between −10 and −2). As such, when utility for losses in life duration is convex, each year in which the improvement in quality of life is obtained gets less weight than each year given up. As a result, as compared to the reference case, this yields a concave indifference curve, and the respondent should give up fewer years to offset the improvement in quality of life (U(FH) − U(Q x )) and restore indifference. Hence, we obtain (Q l x , T * x )∼(FH l , T * y '), with T * y ' > T * y . Second, we take into account loss aversion, i.e. increased sensitivity to losses relative to T r . Loss aversion yields reluctance to give up life years, and to account for this effect each year given up in Q x should offset a larger part of the quality of life gained (U(FH) − U(Q x )). This yields the steeper indifference curve in Fig. 1, compared to the reference case where people are equally sensitive to gains and losses. As a result, if one is loss averse and durations in TTO occur below T r , fewer years (T * y '') should be given up to restore indifference, yielding (Q l x , T * x )∼(FH l , T * y ''). Thus, we predict that loss aversion with respect to SLE will decrease the years given up for TTO-losses versions as compared to gain versions. This conclusion also holds when taking into account non-linearity in the utility curve for life duration (see Fig. 1).

SLE effects for SG
For SG, consider a subject willing to accept at most a 20% risk of immediate death for SG-gains. In our notation, this yields the following indifferences for gains: (Q g x , 10)∼(FH g , 10) 0.8 (D). We will derive predictions from our model as to what this indifference implies for probability of death accepted in loss versions. In the reference case, linear QALYs and EU hold, i.e. the subject will also accept at most a risk of 20% of immediate death for the loss version, i.e. (Q l x , 0)∼(FH l , 0) 0.8 (D). In Fig. 2, we have represented such a combination of preferences more generally. Initially we observe the same indifference, i.e.
That is, shifting t g a to t l a does not affect preferences, i.e. people are willing to risk the same probability of (D) for both SG-gains and SG-losses. This combination of indifferences in the reference case indicates that in both scenarios the possibility of an improvement in quality of life (U(FH) − U(Q x )) for T x exactly offsets the generally small chance of dying immediately. In case the difference in quality of life increases, i.e. when ( (U(FH) − U(Q x ))) increases, a larger chance of dying immediately will be accepted. However, just as for TTO, such a combination of indifferences does not take into account any discrepancies in the evaluation of gains and losses.
In Fig. 2 we provide two illustrations of how SG responses are affected when SLE serves as RP: a) probability weighting (which may be different between gain and loss versions), and b) loss aversion. First, whereas in the reference case, probabilities are treated linearly (and thus also identically between gains and losses), our model based on prospect theory allows non-linear probability weighting. Importantly, it is typically observed that probability weighting is less pronounced for losses compared to gains, that is probability weighting is less inverse-S shaped, which has been found for health outcomes (e.g. Attema et al., 2013, Attema et al., 2016 and monetary outcomes (e.g. Kahneman and Tversky, 1979;Tversky and Kahneman, 1992;Abdellaoui, 2000). This implies that if SLE serves as RP, the same (small) probability of an extreme outcome receives more decision weight for SG-gains version compared to SG-losses versions. Inversely, when we observe a gain version indifference , then a higher probability of the extreme outcome (D) may be accepted in the equivalent loss version (Q l x , T * x )∼(FH l , T * x ) p l (D). For example, imagine a subject with p g = 20%, which implies that immediate death with decision weight w + (0.20) offsets 3 the possible gain of quality of life for (U(FH) − U(Q x )) with duration T * x . When we have w + (p g ) > w − (p l ) for p g = p l , an increase in p l to p l ' is required to restore indifference, i.e. for the disutility of (D) to offset (U(FH) − U(Q x )).
Second, we take into account loss aversion by again assuming increased sensitivity to the possibility of losing compared to SLE, i.e. to durations below T r . Importantly, the consequence of immediate death (D) differs between gain and loss versions; in the SG-gain version, entails a 20% chance of living up to SLE, while for loss versions dying immediately means living 10 years shorter than expected (i.e. a loss). Hence, SG-gain versions, in our experiment, involved no losses compared to T r , and were not affected by loss aversion. Hence, if losses are incurred more reluctantly, smaller probabilities (p l < p g ) of a loss are accepted for the same difference in quality of life ( (U(FH) − U(Q x ))). In Fig. 2 we illustrate this by a steeper indifference curve.
Summarizing, for TTO our model predicts two SLE effects, both decreasing the life years given up for losses, while for SG our model predicts SLE effects in opposite directions, where the net direction is determined by the degree of loss aversion and differences in probability weighting for gains and losses. Given that these predictions differ between TTO and SG, shifting gauge duration from above to below SLE (i.e. moving from t g a to t l a ) may yield different SLE effects between these two methods. We can derive no predictions about differences in magnitude of these SLE effects for TTO and SG, as they are affected by different components of prospect theory.

Study 1
In this first experiment, we tested our predicted SLE effects for TTO and SG in a lab experiment with a convenience sample of students.

Methods
This lab experiment started with several questions regarding expectations about length and quality of life followed by an elicitation of TTO and SG. Example instructions and screenshots can be found in Online Supplements. The experiment used a 2 by 2 (method: TTO vs. SG, version: losses vs. gains) within-subjects design, with randomization by method. The experiment was completed by 102 Business Administration students, 4 recruited in the Erasmus Behavioral Lab in Rotterdam. A total of 71 males participated, and mean age for our sample was 20.25 (SD = 1.22). All students were rewarded course credit for participation in this 30-minute study.

Measures of expectations about length and quality of life
We measured students' expectations about length of life with the following questions (in this order): a) 'What is the minimum age you would hope to become?', b) 'What is the maximum age you would want to become?', and c) 'How old do you expect to become?'. The first two measures were obtained to explore how the typical estimates for SLE fall in between individuals' aspired minimum and maximum age, while question c) measures SLE, using a similar phrasing as van Nooten et al. (2009). Students answered all three questions using a drop-down menu with answers in full years ranging from 30 to 120. To check if health states were considered acceptable, we also explored expectations about quality of life by obtaining a measure of acceptability for the health states that were used to apply TTO and SG (see Table 1). These questions were included as a manipulation check, to determine whether our model, which pertained to acceptable health states, can be applied. To introduce this concept, we used the following instruction (adapted from Wouters et al., 2015): 'In what follows you will receive questions regarding health at different ages. Generally, health deteriorates when we get older. Consider for example an 80 year-old person who is not able to walk further than 1 km. You might find this an acceptable condition for someone of 80, but less acceptable for 20 year old persons.' Next we asked them to rate all three health states using an identical drop-down menu ranging from 30 to 120, using the following question: 'Could you please indicate from which age onwards you find the following three health states acceptable?' Students could also answer 'Never', if they felt that a deteriorated health state was not acceptable at any age.

Operationalization of TTO and SG
All versions of TTO and SG featured a gauge duration of 10 years followed by immediate death, which is typical for this type of valuation exercises (Van Nooten et al., 2009). These four valuation exercises (TTO-gains, TTO-losses, SGgains and SG-losses) were all completed for three health states described by means of the EuroQol EQ-5D-5L classification system (see Table 1 for the selected health states). TTO and SG were operationalised by using two-stage choice lists (see Online Supplements), which were computerized via Qualtrics to prohibit multiple switching and violations of (stochastic) dominance within each choice list. For TTO, a first choice list identified indifference in years, and afterwards in months in a second choice list. For SG, choice lists elicited indifference with a first choice list identifying indifference at probability intervals of 10%, and afterwards specifying this in percentage points in a second choice list.

Results
Table 2 reports descriptive statistics for our measures of expectations about length and quality of life. On average, students expected to become close to 85 years old, while wishing to become at least around 77 and at most close to 100 years. As we restricted our theoretical analyses to health states considered acceptable, we determined if students deemed health states Q1, Q2 and Q3 acceptable at all ages used in implemented TTO or SG versions. Overall, health states Q1 and Q2 were considered acceptable by most students, for all ages considered in this experiment, with 84% (Q1) and 72% (Q2) of our sample indicating that such a health status is acceptable from a lower age than the ages considered in our experiment. The most severe health state (Q3) was not considered acceptable at the lowest age considered (i.e. t l a ), with only 34% of our sample considering such health problems acceptable at the ages presented in the loss versions of TTO and SG. For gain versions, this percentage was considerably higher, at 80%. This indicates that if reference-dependence exists for health status (as proposed by Wouters et al., 2015), this RP may fall in between the ages considered in the gain and loss versions of TTO and SG for health state Q3. However, we find relatively little non-trading (i.e. QALY weights of 1), with rates of nontrading from as low as 2% to 18% of the sample. Hence, to see if acceptability affected our main results, we ran several tests to explore whether this violation of the simplifying assumptions as described in our theoretical model affects TTO and SG responses (see Online Supplements). We did not observe such an effect of acceptability of health status on TTO and SG responses. As such, we report our main results without excluding respondents from the sample.

Testing predicted SLE effects for TTO
First, we tested our predictions about SLE effects in the two versions of TTO (i.e. TTO-gains and TTO-losses). Table 3 shows aggregate results for TTO responses in both versions. In accordance with our predictions, fewer life years were given up in loss versions of TTO compared to gain versions for all health states (Wilcoxon tests, all p's < 0.001). According to our model, this suggests that students would either be loss averse or showed less pronounced utility curvature for losses in life duration. Inversely, giving up fewer life years for loss versions will yield higher TTO weights, i.e. higher QALY weights assigned to the same health state. When we analysed our data at within-subjects, we observed that for Q1, Q2 and Q3 respectively, 61, 65, 68% of sample gave up fewer life years in loss-versions. For all three health states, these proportions were significantly larger than the part of our sample that gave up equal life years for both versions, or more life years for loss-versions (all 2 's (2, N = 102) > 39.71, all p's < 0.001).

Testing SLE effects for SG
Next, we compared the probabilities of immediate death risked in SG between the two versions (i.e. SG-gain and SG-losses). As can be seen from Table 3, lower probabilities of immediate death were risked for loss versions of SG compared to gain versions (Wilcoxon tests, all p's < 0.001). According to our theoretical model, this implies that the effect of loss aversion was more pronounced than that of differences in probability weighting. Inversely, this leads to the conclusion that SG with durations below SLE will yield higher QALY weights for the same health state. When Table 3 Median years given up in TTO and probability of death risk in SG, including within-subject differences between gain and loss versions.

Gains
Losses Diff. we analysed our data within-subjects, we observed that for Q1, Q2 and Q3 respectively, 51, 49, 51% of our sample was willing to take a smaller risk of immediate death in loss versions. For all three health states, these proportions were significantly larger than the part of our sample that assigned equal probabilities to both versions, or was willing to risk a higher chance of immediate death for loss versions (all 2 's (2, N = 102) > 10.65, all p's < 0.005).

Comparing SLE effect between TTO and SG weights
In order to compare SLE effects between TTO and SG we needed to normalise weights obtained by these two health state valuation methods to fit on the same scale. We will achieve this normalisation by applying the derivation of TTO and SG weights under EU and the linear QALY framework (i.e. Eq. (2) and Eq. (3). 5 Although this is inconsistent with our theoretical model based on prospect theory, it is in line with how TTO and SG responses are typically transformed into QALY weights (see for example: Versteegh et al., 2016;Brazier et al., 2002). Hence, these comparisons may also illustrate the direction and magnitude of reference-dependence with respect to SLE when TTO and SG weights are obtained when this is not accounted for. Fig. 3 illustrates the aggregate results for our sample. Within versions (i.e. gains or losses), SG weights were significantly higher than TTO weights (Wilcoxon tests, all p's < 0.037). When comparing within valuation methods (i.e. TTO or SG), QALY weights for health state valuation exercises involving losses produced significantly higher QALY weights, both for TTO and SG (Wilcoxon tests, all p's < 0.002). For both methods, the differences between gain and loss versions were of similar magnitude for Q1, Q2 and Q3 (not significantly different, Wilcoxon tests, p's > 0.52). These findings indicate that shifting gauge duration below SLE resulted in an average increase in TTO weights of between 0.15 and 0.23. For SG, a similar pattern was observed, with significant differences between gains and losses visible, where moving life years below SLE increased SG weights on average by 0.02 to 0.12. These SLE effects were significantly larger than 0, and larger for TTO weights compared to SG weights across all three health states (Wilcoxon tests, p's < 0.002). We validated these SLE effects using a mixed effects regression, which also showed that our conclusions appear to be unaffected by acceptability of the health states Q1, Q2 and Q3 or gender (see Online Supplements). Finally, we tested whether the typical difference between TTO and SG weights is affected by moving the gauge duration below SLE. To this end, for each subject, we calculated a difference score between TTO and SG per health state, with difference scores being obtained within versions (e.g. TTO-gains vs. SG-gains). This TTO-SG difference was smaller for losses compared to gains (Wilcoxon tests, all p's < 0.02), but differences remained significantly larger than 0 (Wilcoxon tests with Á = 0, all p's < 0.04). Collectively, these findings suggest that moving the gauge duration below SLE increases QALY weights, with this SLE effect being larger for TTO than for SG.

Discussion
This section briefly discusses the results of Study 1 and the main limitations of this experiment that are remedied in Study 2. A more elaborate discussion of the results and limitations can be found in section 6 ('General Discussion').
In accordance with our theoretical predictions, we observed a reduced willingness to give up life years in the TTO-loss version compared to the TTO-gain version (i.e. SLE-effect for TTO). For SG, similar to the TTO results, subjects were reluctant to risk losing life years when deciding about life years that fell short of their expectations (i.e. SLE-effect for SG). When comparing normalised TTO and SG weights (calculated in the common way, based on EU and the linear QALY model), we observed that QALY weights increased when gauge durations were moved below SLE, albeit to a larger extent for TTO. Hence, the difference between TTO and SG was smaller for loss versions. However, the QALY weights elicited for Q1, Q2, and Q3 were low, especially compared to earlier work on health state valuation with general population samples (Versteegh et al., 2016;Devlin et al., 2018). The results from Versteegh et al. (2016) allowed calculating a QALY weight for health states representative of the Dutch general public's valuation (i.e. tariffs). For example, Q1, Q2 and Q3 were assigned valuations of 0.88, 0.79 and 0.68, respectively, which is considerably larger than the valuations in Study 1 (especially those elicited with gain versions).
The low QALY weights elicited in Study 1 suggest that students were willing to give up large proportions of their remaining life or accept high risks of immediate death, just to avoid living in health states with relatively minor problems. At least two reasons can be provided to doubt the validity of such responses to TTO and SG. First, students were paid course credits for participation in this study. Generally, in behavioural experiments in health it is preferred to use financial incentives to motivate respondents to carefully consider their responses (Galizzi and Wiesen, 2018). As such, without an incentive to provide effort in our modified versions of TTO and SG, it could be hypothesized that students invested too little effort in considering the  consequences of their choices. Hence, to resolve this issue, in Study 2 respondents were provided with a monetary reward for participation. Second, this first experimental exploration of the process by which SLE affects valuations relied on a convenience sample of students. Obviously, this sample is small and not representative for the Dutch population in terms of age and education level. All students were required to imagine being much older than their current age and living in health states they were unlikely to have experienced. This could be problematic, as earlier work has shown that individuals may experience difficulty accurately predicting their future choices (i.e. projection bias, see: Loewenstein et al., 2003). Furthermore, earlier work has found that SLE is associated with both age and education level Péntek et al., 2014;Rappange et al., 2016), and TTO depends on attitudes regarding ageing and end-of-life (Van Nooten et al., 2016), which may also be different for students compared to older populations. Hence, in Study 2 we applied our empirical tests in a sample of older persons to investigate the external validity of our findings.

Study 2
In the second experiment, to test the external validity of our findings, we aimed to replicate our predicted SLE effects for TTO and SG in an online experiment with individuals aged 60 years and older. The methods were almost identical to that of Study 1, and as such we will only highlight modifications to the method below. Furthermore, seeing as we applied a similar analysis strategy, we will present the results of Study 2 without repeating the (more) detailed descriptions found in section 4.2.

Methods
Study 2 used the same measures of expectations about length and quality of life, health states and operationalisations of TTO and SG as were used in Study 1. However, the experimental task (programmed in Qualtrics) was now distributed online to a sample of 328 people aged 60 years and older. This was done through Prolific, a platform for online research with a large sample of individuals, who mostly live in the UK and US. It allows screening for a wide array of demographics, including age. When this experiment was run, Prolific had around 2600 users that were eligible (i.e. 60 years and older) and active in the last 90 days. Respondents were rewarded 3£ for taking part in this experiment. On average it took respondents 24 min to complete the experiment. Only a single question was added to the original set up in Study 1, i.e. a question to investigate experience with chronic illness. 6 Demographic characteristics for this sample of older people can be found in Table 4. Table 5 reports descriptive statistics on expectations about length and quality of life. The findings for SLE were similar to Study 1 with median SLE being 85 years old. Compared to the students in Study 1, respondents wished to become significantly less old (i.e. SLE-max was smaller) and considered impaired health states acceptable from a higher age onwards (Wilcox tests, p's < 0.001). Consequently, only 53% (Q1), 36% (Q2) and 16% (Q3) of the respondents considered these health states acceptable at all ages considered in the experiment. These were significantly smaller proportions than observed in Study 1 (all 2 's (2, N = 328) > 47.95, all p's < 0.001). A risk of having older respondents completing the experiment is that t l a (i.e. the age they are asked to imagine to be in the loss versions of TTO and SG) is lower than their current age. This was the case for 32 respondents (10% of the sample). However, excluding these respondents did not affect our results (see Online Supplements). Furthermore, compared to Study 1 for all conditions and health states we found larger amounts of non-trading with rates of non-trading ranging from 12.5% to 34% of the sample (all 2 's (2, N = 328) > 3.93, all p's < 0.05). As for Study 1, several analyses were performed to check if acceptability of health states affected QALY weights or the main conclusions of our study (see Online Supplements). We also included having a chronic disease in these analyses. Acceptability did not affect QALY weights, but experience with chronic disease was associated with higher QALY weights. However, our main results were similar for those with and without experience with disease (see Online Supplements). Hence, we report on the full sample below. Table 6 shows aggregate results for TTO and SG responses in both versions. As in Study 1, fewer life years were given up in the loss versions of TTO compared to gain versions for all health states (Wilcoxon tests, all p's < 0.001). We observed that for Q1, Q2 and Q3 respectively 50%, 49%, and 41% of the sample gave up fewer life years in loss versions (rather than more or the same), which was a significant majority (all 2 's (2, N = 328) > 36.67, all p's < 0.001). As can be seen by comparing Tables 3 and 6, the SLE effect for TTO appears smaller for this older sample, but this difference was never significant (Wilcoxon test, all p's > 0.06). In contrast to Study 1, we found no SLE-effect for SG, i.e. no evidence for lower probabilities of immediate death risked for the loss version compared to the gain version. We observed that for Q1, Q2 and Q3 respectively, 35, 30, 31% of our sample was willing to take a smaller risk of immediate death in loss versions (with similar proportions of the sample taking higher risks for loss versions). As can be seen by comparing Tables 3 and 6, the SLE effect for SG was smaller in Study 2, and this difference was indeed significant for all three health states (Wilcoxon test, all p's < 0.002). Finally, we explored whether excluding nontrading responses affected our findings for SLE effects for TTO and SG. Although this indeed increased effect sizes for TTO, the conclusions remained qualitatively similar (see Online Supplements). Fig. 4 illustrates the aggregate normalised QALY weights for each version. For each condition, QALY weights were significantly higher compared to Study 1 (Wilcoxon tests, p's < 0.04), except for SG losses for Q1 (Wilcoxon test, p = 0.11). We also compared our results against the QALY weights calculated using the results by Devlin et al. (2018), which represent QALY weights for a sample representative of the UK (i.e. the country of residence for most of Table 6 Median years given up in TTO and probability of death risk in SG, including within-subject differences between gain and loss versions.

Gains
Losses Diff.

General discussion
The goal of this paper was to (further) explore SLE effects for TTO and SG by means of a within-subjects approach. We constructed a theoretical model based on prospect theory, which allowed us to test its predictions using different versions for TTO and SG, with a gauge duration occurring either completely below (i.e. losses) or above SLE (i.e. gains). Although EU and the QALY model give no reason to expect differences between these TTO and SG versions, prospect theory, on the other hand, implies that if SLE functions as RP, loss aversion and sign-dependent evaluation of life years and probabilities can give rise to discrepancies between different versions. It was predicted that fewer years would be given up in TTO when elicited below SLE, i.e. TTO weights would be higher. Furthermore, for SG our predictions based on prospect theory suggest that, depending on their loss aversion and probability weighting functions, individuals would be willing to either increase or decrease their risk of immediate death, i.e. the effect on SG weights is ambiguous.
We tested these predictions in two studies with a student (Study 1) and sample of individuals aged 60 years and older (Study 2). We find SLE to be similar to estimates from earlier work Van Nooten et al., 2009;Péntek et al., 2014;Rappange et al., 2016). Furthermore, SLE falls in between maximum and minimum aspired ages, suggesting that it could indeed be taken as RP within prospect theory, as this is typically seen as a neutral position (Wakker, 2010). In accordance with our theoretical predictions, if life years in TTO occurred below SLE, we observed less willingness to give up life years in both Study 1 and Study 2. Hence, our results for TTO confirm the SLE effect observed in earlier work (van Nooten and Brouwer, 2004;Van Nooten et al., 2009;Heintz et al., 2013;van Nooten et al., 2014) where similar comparisons were made between individuals expecting to live longer than TTO gauge duration or shorter. Furthermore, seeing as it occurs both in student and samples with older respondents, it appears to be robust to individuals' current age, which provides some support for the external validity of the effect of SLE on TTO. These findings (according to our model) suggest that: a) subjects refrain from giving up life years compared to SLE as a result of loss aversion (as suggested by Van Nooten et al., 2009), and/or b) subjects show less diminishing marginal utility for life duration for losses compared to gains with respect to SLE.
For SG, the results for Study 1 were similar to those for TTO, i.e. students were more reluctant to risk losing life years when deciding about life years that fall short of their expectations. That is, when the gauge duration occurs below SLE, lower chances of immediate death were taken, which is, to our knowledge, a novel finding. However, these results were not replicated for people aged 60 years and older in Study 2. Given that our model based on prospect theory yields ambiguous predictions for SG, it can provide an explanation for this null result in Study 2. For Study 1, our findings suggest that loss aversion decreased willingness to risk immediate death for gauge durations below SLE. Our model predicts that probability weighting for gains and losses may have offset part of the effect due to loss aversion, which may explain the weaker effect of SLE for SG in Study 1, and perhaps the null result in Study 2. To explain the non-significant SLE effect for SG in Study 2 individuals aged 60 years and older should be less loss averse and/or had larger differences in probability weighting between gains and losses than the student sample in Study 1 had.
Although we derived predictions based on assumptions about loss aversion, utility curvature, and probability weighting, we did not include an empirical measurement of these prospect theory parameters. Instead, our predictions were based on earlier work on prospect theory for both health outcomes (e.g. Attema et al., 2013Attema et al., , 2016Attema et al., 2018;Kemel and Paraschiv, 2018;Lipman et al., 2019bLipman et al., , 2018 and monetary outcomes (Abdellaoui et al., 2008;Bruhin et al., 2010;Kemel and Paraschiv, 2018), where substantial loss aversion and differences in curvature of probability weighting and/or utility functions between gains and losses were observed. As such, our study does not allow to directly test our theoretical explanations for the SLE effect for TTO, nor to determine why SLE effects could be observed in Study 1 but not in Study 2. Hence, combining our experimental approach with measurement of prospect theory parameters is a promising avenue for future research.
However, existing work suggests that older individuals are typically more loss averse and loss aversion decreases with education level (Gächter et al., 2007;Arora and Kumari, 2015), which would lead us to expect stronger loss aversion in the sample of Study 2. For probability weighting the evidence is inconclusive. Donkers et al. (2001) find some evidence that suggests more pronounced weighting of probabilities for higher ages, but they do not differentiate between gains and losses. As such, at least two alternative explanations for the null result for SG in Study 2 appear relevant. First, QALY weights for health states Q1, Q2 and Q3 were considerably higher in Study 2 compared to Study 1, with especially SG weights for these mild health states nearing 1.00. It may be possible that no SLE effect for SG is observed due to a ceiling effect, which could be tested by incorporating more severe states in future work. Second, it is possible that this null result is explained by differences between the student and older samples in how they perceive life (and death) at the ages considered in this experiment. Our results provide some indication for this, with students indicating to find health problems acceptable from younger ages than individuals aged 60 years and older. Future work could explore, for example using qualitative interview techniques, the influence of these perceptions on the effects of SLE on QALY weights.
Before arriving at the conclusion that SLE serves as RP in TTO and SG, several alternative explanations, not related to reference-dependence with respect to SLE, and methodological limitations should be considered. First, subjects in both studies were asked to imagine being older than their current age. If subjects did not adopt the instructions in our experiment, the gauge durations in this experiment would be strongly discounted (van der Pol and Roux, 2005;Attema and Brouwer, 2010). Given that loss versions of TTO involved years below SLE, these would necessarily occur earlier in time than years given up in gain versions if current age is adopted instead of SLE. Thus, compared to their current age, life years in gain versions are likely to be discounted more strongly, and given up more willingly compared to life years for the loss version of TTO (i.e. this would predict higher QALY weights for loss versions). Similarly, if subjects used their current age instead of hypothetical ages in our experiment, the time dimension may explain higher utility for SG-losses compared to gains, as for monetary outcomes it is well-known (Abdellaoui et al., 2011;Baucells and Heukamp, 2012;Noussair and Wu, 2006) that risk-seeking increases when lotteries are resolved in the future. As such, SG-gains are resolved further away in the future than loss versions, and thus higher risk-seeking could explain the higher risks of death accepted for gain versions of SG. Hence, although it is not possible to make sure subjects indeed adopted our instructions, in the Online Supplements we show that if subjects did not adopt the ages in our experiment the effects of discounting would be negligible. Hence, given that we do find significant SLE effects, this is not likely to result from failure to adopt t g a and t l a . Second, scale compatibility has been suggested to bias both SG and TTO (Bleichrodt, 2002;van Osch and Stiggelbout, 2008). Our manipulation, i.e. shifting life years around SLE, may have caused subjects to focus on life duration in TTO and SG. Given that life duration is fixed and equal in both options in SG, while in TTO life duration is varied along the choice list, this may explain the stronger effect of our manipulation on TTO, especially as the RP was also operationalised on the scale of life duration. Third, even though we provided respondents in Study 2 a monetary incentive to diligently complete our experiment, their rewards were not contingent on their choices (such incentive compatibility is typically preferred in economic experiments, Galizzi and Wiesen, 2018). Although earlier work in economics suggests that the use of hypothetical choices as opposed to incentive-compatible choices has little to no effect on preferences (Hertwig and Ortmann, 2001;Camerer and Hogarth, 1999), we encourage the exploration of incentive-compatible choices in the context of health. Fourth, all theoretical predictions in this study were based on prospect theory, and hence, we explicitly assumed prospect theory to hold for decisions about health. Several authors have found violations of prospect theory, mostly for monetary outcomes (Birnbaum, 2006;Payne, 2005;Bateman et al., 2007), but also for health (Feeny and Eng, 2005). As such, future work could explore if TTO and SG can be modelled in other reference-dependent models (Kőszegi and Rabin, 2006). Finally, to accommodate our subjects and avoid confusion or unnecessary errors, we maintained a consistent ordering throughout the experiment. Future work could explore whether this lack of counterbalancing between-subjects could have affected our conclusions, although other authors find no effects of order on gain-loss framing (e.g. De Dreu et al., 1994).

Conclusion
Whereas it is well-known that TTO and SG weights are typically different (e.g. Read et al., 1984;Torrance, 1976), earlier work on the role of SLE has exclusively focused on TTO. Our work suggests that decision-making in both health state valuation methods may be affected by subjective expectations about length of life, with QALY weights being higher for TTO and (to a lesser extent) SG when gauge durations are below SLE; i.e., SLE may serve as RP in health state valuation. This SLE effect could be relevant for the current practice in health state valuation, as this typically involves short gauge durations, which imply losses compared to their SLE for a large part of the sample. For example, when obtaining nationally representative TTO tariffs for EQ5D, EuroQoL typically uses a 10 year duration for health states preferred to death (Oppe et al., 2014), which must fall short of SLE for many subjects. Applying derivations based on EU or linear QALYs will then yield TTO or SG weights that are too high.
Although finding a solution for this biasing effect attributable to SLE seems warranted, as discussed by Heintz et al. (2013), it can be complex to choose an appropriate duration for health state valuation. Durations below SLE may induce reluctance to lose any life years at all, while durations above SLE may yield lower QALY weights as individuals are more willing to lose some of these 'bonus years'. To our knowledge, no compelling normative argument exists to prefer either of these scenarios, suggesting that it may be necessary to acknowledge these possible biases and derive health state utility in a reference-dependent model (as discussed by: Abellan-Perpiñan et al., 2009;Lipman et al., 2019a, b). Therefore, we hope that our attempt to unify earlier work on reference points in health state valuation into a formal model based on prospect theory provides some insight into the consequences of not being able to live up to expectations about length of life.

Funding source
This research did not receive any specific grant from funding agencies in the public, commercial, or not-forprofit sectors.

Online Supplements
Supplementary data associated with this article can be found, in the online version, at https://doi.org/10.1016/ j.jhealeco.2020.102318.

Declaration of Competing Interest
None.