Aversion to Health Inequality – Pure, Income-Related and Income-Caused

We design a novel experiment to identify aversion to pure (univariate) health inequality separately from aversion to income-related and income-caused health inequality. Participants allocate resources to determine health of individuals. Identification comes from random variation in resource productivity and in information on income and its causal effect. We gather data (26,286 observations) from a UK representative sample (n=337) and estimate pooled and participant-specific social preferences while accounting for noise. The median person has strong aversion to pure health inequality, challenging the health maximisation objective of economic evaluation. Aversion to health inequality is even stronger when it is related to income. However, the median person prioritises health of poorer individuals less than is assumed in the standard measure of income-related health inequality. On average, aversion to that inequality does not become stronger when low income is known to cause ill-health. There is substantial heterogeneity in all three types of inequality aversion.


Introduction
Aversion to health inequality is multifaceted. Concern about differences in health between individuals irrespective of their non-health characteristics motivates prioritisation of health gains to the least healthy. Concern about systematic differences in health between individuals distinguished by income motivates prioritisation of health gains to the poor. Elicitation of social preferences over the distribution of health usually fails to separate aversion to pure health inequality (the first case) from aversion to income-related health inequality (the second case). This risks confounding one type of aversion with the other and so biasing distributionally sensitive evaluation of health programmes that use the elicited preferences.
We design a novel experiment to separately identify the two facets of health inequality aversion.
In the experiment, participants play the role of social decision makers allocating resources to determine the health of individuals. By varying resource productivity between individuals, we force trade-offs between increasing aggregate health and decreasing health inequality.
This identifies aversion to inequality. In one treatment, individuals are anonymous. This identifies aversion to pure health inequality. In a second treatment, we label individuals by randomly assigned incomes. Responses to this information, as well as the trade-offs made between health maximisation and equalisation, simultaneously identify both aversion to income-related health inequality and aversion to pure health inequality.
Aversion to income-related health inequality may derive from the consequences of that inequality for the distribution of well-being defined over health and income. A given marginal distribution of health generates greater inequality in well-being when health and income are positively correlated. This may motivate prioritisation of poorer individuals in the distribution of health resources in order to compensate material disadvantage with improved health.
Aversion to differences in health by income could also derive from a perception that lower income causes worse health and a belief that this is unfair. Many may see injustice in the poor being in worse health because they cannot afford medicine or nutritious food. This would motivate prioritisation of health gains to the poor beyond that deemed appropriate to compensate for poverty through improved health.
To identify the extent to which belief in a causal effect of income on health strengthens aversion to income-related health inequality, we tell participants, in a third treatment, the proportion of differences in resource productivity that is caused by differences in incomes.
This induces exogenous variation in beliefs about income-health causality that is used to identify the effect of these beliefs on prioritisation of the health of poorer (or richer) individuals. Any systematic shift in allocations implies aversion to income-caused health inequality.
We minimise restrictions on the social preferences we can identify by asking participants to allocate resources when richer individuals are both advantaged and disadvantaged in the production of health. We allow preference for a negative, positive, or zero association between health and income. We can identify preference for prioritisation by income that weakens, not only strengthens, when income is known to cause health.
We ran the experiment online and recruited a broadly representative sample of the UK adult population. Repeated choices by 337 participants give 26,286 observations that we use to obtain both pooled and participant-specific estimates. The latter is potentially important given the substantial heterogeneity found in other estimates of social preferences (Cappelen et al., 2007;Fisman, Kariv, and Markovits, 2007;Hurley, Mentzakis, and Walli-Attaei, 2020).
A within-subject design further increases statistical power.
We estimate parameters of a social welfare function that aggregates over a population health profile through a social utility function of each individual's health that is potentially weighted by a function of their income (Makdissi and Yazbeck, 2016). The revealed will-ingness to sacrifice health maximisation for less inequality identifies the degree of concavity of the utility function that reflects aversion to pure health inequality. If the allocations are independent of income, then the weights are constant and the model collapses to an Atkinson (1970) welfare function that others have used to estimate health inequality aversion (Dolan, 1998;Dolan and Tsuchiya, 2011;Robson et al., 2017). Prioritisation by income identifies weights that decrease or increase with rising income to reflect aversion to prorich and pro-poor health inequality, respectively. We use a general weighting function that nests one that underpins the most common measure of income-related health inequality, the (extended) concentration index (Wagstaff, Paci, and Van Doorslaer, 1991;Wagstaff, 2002;O'Donnell et al., 2008). Consequently, the model (Makdissi and Yazbeck, 2016) also encompasses Wagstaff's achievement index that penalises mean health for pro-rich health inequality without allowing for aversion to pure health inequality (Wagstaff, 2002). We extend the weighting function to include two parameters that represent aversion to income-related health inequality and the extent to which it intensifies (or weakens) with beliefs about causality. We use responses to the causal information treatment to separately identify these parameters and so the extent to which the income weights shift with causality beliefs.
Non-parametric analysis reveals that no participant maximises aggregate health. All are averse to pure health inequality. This is consistent with evidence that most people state a willingness to sacrifice health maximisation in order to reduce health inequality (Abellan-Perpiñan and Pinto-Prades, 1999;Ubel and Loewenstein, 1996;Ratcliffe, 2000;Gyrd-Hansen, 2004;Dolan et al., 2005;Richardson et al., 2012). We find that more than three quarters of participants are prepared to sacrifice health maximisation to an extent that involves giving more (not just any) resources to individuals who benefit less from them. On average, priority is given to the health of the poor. This is driven by less than a quarter (23%) of participants who allocate to ensure that the poorest individual gets more health than the richest. On average, information about the extent to which income causally determines health has no impact on the allocations, which suggests that aversion to income-related health inequality is not contingent on beliefs about causality.
Parametric analysis gives a pooled estimate of constant relative health inequality aversion of 1.4, which indicates moderate prioritisation of the least healthy. However, there is substantial heterogeneity, as in related studies (Cropper, Krupnick, and Raich, 2016;Hurley, Mentzakis, and Walli-Attaei, 2020). The median participant-specific estimate of this parameter is 3.2, indicating that a majority displays substantial aversion to pure health inequality.
The estimate increases slightly to 3.5 when income weights are estimated simultaneously.
Both the pooled and median estimates indicate weak prioritisation of the health of the poor after taking account of aversion to pure health inequality. Weights decline with rising income rank but less rapidly than those implicit in the concentration index. That index forces aversion to health inequality and prioritisation of health by income into a single parameter that determines the weights. We separate these two dimensions of social preferences and so avoid limitations of the concentration and achievement indices. One is a lack of aversion to health differences that are not related to income. Another is a restriction on the (abbreviated) social welfare function that, paradoxically, implies that welfare increases with health inequality provided it is pro-poor (Makdissi and Yazbeck, 2016). In fact, our parametric estimates indicate that a little less than a quarter of the UK population is pro-richall else equal, they prioritise the health of richer over poorer individuals. Both pooled and median estimates of the degree of aversion to income-related health inequality are insensitive to information on causality.
Specifying social preferences in the form of an Atkinson social welfare function based on a relative conception of inequality combined with rank-dependent income weights fits the data slightly better than both a Kolm-Pollak social welfare function based on absolute inequality (Pollak, 1971;Kolm, 1976) and share-dependent income weights. Neither the Atkinson welfare function nor Wagstaff's achievement index is sufficiently flexible by itself to capture median preferences. There is clear evidence of consequentialism -participants allocate resources to optimise the distribution of health.
Beyond our empirical findings, we make three methodological contributions that can improve the reliability and extend the scope of evidence on health inequality aversion. This is the first study to simultaneously estimate aversion to pure health inequality and income weights that together determine aversion to income-related health inequality. Many studies elicit aversion to only one type of health inequality (Abasolo and Tsuchiya, 2004;Dolan and Tsuchiya, 2011;Cookson et al., 2018;Robson et al., 2017;Hardardottir, Gerdtham, and Wengström, 2021). Some elicit aversion to pure and income-related health inequality separately, with each of these social preferences represented by a different single-parameter welfare function (Dolan and Tsuchiya, 2009;Hurley, Mentzakis, and Walli-Attaei, 2020;McNamara, Tsuchiya, and Holmes, 2021). One study estimates a two-parameter welfare function but imposes independence between the aversion to pure health inequality and the socioeconomic weights that the parameters represent (Pinho and Botelho, 2018). Simultaneous estimation of the two parameters is important because restricting attention to one fails to distinguish between concern for the poor and concern for the less healthy. This confounding will upwardly bias willingness to prioritise health of poorer individuals when health and income are restricted to be positively correlated, which, as far as we know, is the case in all previous studies. In our set up, income is orthogonal to (potential) health. This is also the first study to allow aversion to income-related health inequality to depend on the extent to which income causes better (or even worse) health. An outcome-focused concern about poor people also being unhealthy can be distinguished from a procedural concern about people being unhealthy because they are poor (Schokkaert and Devooght, 2003).
A presumption of greater aversion to income-related health inequality that arises from a causal effect of income on health has not been tested until now. If aversion were to depend on causality, then causal evidence would be required not only to design policies that reduce income-related health inequality but also to assess the normative rationale for such policies.
A third innovation is that our design and empirical approach make it possible to estimate participant-specific inequality aversion while allowing for noise in the participant's choices.
Most studies elicit participant-specific preferences without allowing for noise (Abasolo and Tsuchiya, 2004;Dolan and Tsuchiya, 2009;Dolan and Tsuchiya, 2011;Robson et al., 2017;Cookson et al., 2018;Hardardottir, Gerdtham, and Wengström, 2021;McNamara, Tsuchiya, and Holmes, 2021). This risks the mistaken inference of preferences from choices that are simply random errors. These studies often drop a substantial proportion responses that appear irrational. Some studies allow for noise, but only when pooling data across participants (Edlin, Tsuchiya, and Dolan, 2012;Hurley, Mentzakis, and Walli-Attaei, 2020), and so losing the opportunity to estimate participant-specific preferences. We avoid these limitations by estimating inequality aversion from allocations of resources in a series of constrained optimisation problems that present equity-efficiency trade-offs. We allow for the mistakes each participant inevitably makes by estimating a random behavioural model to infer preferences from error-prone allocations that are assumed to be optimal only on average (Harless and Camerer, 1994;Conte and Moffatt, 2014;Robson, 2021).
The next section describes the experiment. Section 3 presents models of social welfare that we fit to the data. Section 4 explains estimation of the model parameters allowing for noise. Section 5 presents results from non-parametric and parametric analyses and compares the data fit of alternative models. Section 6 gives an illustrative application to policy evaluation that demonstrates the value of obtaining participant-specific estimates and the importance of estimating and using aversion to pure health inequality and to income-related weights simultaneously. Section 7 compares our estimates to previous evidence, considers explanations for findings, and acknowledges limitations. The final section concludes.

General Setup
We ran an online interactive experiment to elicit social preferences over the distribution of health. Participants were recruited through Prolific and are broadly representative of the UK adult population.
Preferences are inferred from choices in a series of constrained optimisation problems that pose equity-efficiency trade-offs. In each round, a participant was assigned a randomlygenerated budget and asked to allocate resources to three hypothetical individuals. They were forced to exhaust the budget. The participant was told that the health of each individual would be the product of the resources allocated to that individual and an individual-specific productivity factor referred to as a multiplier (Arrow, 1971). These varied from round to round.
Participants were told that health is the number of years an individual lives adjusted for illness or disability. They were given an example to encourage them to interpret this as equivalent years lived in full health -quality-adjusted life years (QALYs) (Appendix A.2).
The task was completed using an online screen interface designed in R Shiny that is shown in Figure 1. The participant allocates resources using sliders, and can use arrow keys to refine allocations. Resources and the resulting health outcomes are shown graphically by the blue and black bars, respectively, and numerically in the table. The (remaining) budget is shown on the left of the screen. Summary measures are on the right. The Resource Gap is the largest absolute difference between resources allocated to two individuals. The Health Gap is the equivalent for health. Total Health is the sum of the health outcomes. To encourage deliberation, minimum timers were placed on each round.

Treatments
The experiment had three within-subject treatments that are used to identify aversion to A) pure health inequality, B) income-related health inequality, and C) income-caused health inequality. The treatments differ with respect to information participants were given about a) the identity of the three individuals and b) how the multipliers are determined. All participants faced the same chronology of treatments -A, B, and, finally, C. We could not randomise the order as each treatment reveals more information.

Treatment A: Anonymous
In Treatment A, participants were given no information about the identity of the individuals, who were labeled by randomly drawn initials (e.g. CS, SJ, and TD). The multipliers (p i ) changed across 10 rounds (see Appendix A.3 Table A1). Between individual differences in p i force trade-offs between health maximisation and equalisation that are used to identify aversion to health inequality. The distribution of p i over 10 rounds is the same irrespective of the screen position of the individuals. This ensures that comparisons between the positions can be made without bias. The order of the 10 rounds were randomised across participants and the set of initials was drawn randomly for each round and participant.

Treatment B: Income
In Treatment B anonymity was lifted. Participants were told the income of each individual.
Each participant again allocated resources in 10 randomly ordered rounds. The distribution of multipliers over the rounds is identical to that used in Treatment A, allowing direct comparison between the treatments. In each round and for each participant, we randomly selected (without replacement) three incomes from {£5, 000, £10, 000, £25, 000, £50, 000, £100, 000} and used them to label the three individuals, e.g. HD: £10,000. 1 Since different sets of three random draws give the same income ranks but different income shares, we can distinguish between aversion to income-rank and income-share related health inequality.

Treatment C: Income-Causation
In Treatment C, participants were told not only the income (x i ) of each of the three individuals but also the extent to which income differences cause differences in multipliers.
In the two scenarios presented, the set of multipliers was {0.33, 0.5, 1}. 2 In one scenario, these multipliers monotonically increase with income -health interventions are more effective at higher incomes. In the other, they monotonically decrease as income rises. For each scenario, the participant was told, in three different rounds, that the percentage of the 1 The order of the income-labelled individuals on the screen was also randomised between participants, either increasing or decreasing with income. The set of incomes was chosen to give sufficient variation for identification.
2 To facilitate comparison, this is the same set used in rounds 7 and 8 of Treatments A and B (Appendix A.3 Table A1). multiplier differences caused by income is 0%, 100%, and X%, where X was randomly drawn from {20, 40, 60, 80}. We refer to this as causal information (I). See Appendix A.5 for the script and interface.
Each participant completed six rounds in this treatment (Appendix A.3 Table A2). The design ensures that correlation between the multipliers and health is orthogonal to the randomly assigned causal information. 3 Orthogonality allows aversion to income-caused health inequality to be separately identified from aversion to non-causal income-health association. This is done by leveraging the variation in resource allocations made from round to round in response to the causal information.

Timing and sample
After conducting two pilot experiments (Appendix A.6), the experiment was conducted in two sessions over a three-week period. In the first session (December 14-17, 2021), participants received instructions, followed an interactive tutorial, answered follow-up questions of comprehension (Appendix A.2), did Treatment A (10 rounds), and completed a questionnaire about sociodemographic characteristics and beliefs. In the second session (December 18, 2021 -January 5, 2022), the participants followed a shorter tutorial, did Treatment B (10 rounds), completed a belief elicitation exercise (Appendix A.4), did Treatment C (6 rounds), and completed a final questionnaire. The median completion time was 29.0 minutes in the first session and 27.6 minutes in the second. 4 See Appendix A.1 for a graphical overview of the experiment.
3 In addition, each of a) the incomes of the three individuals, b) their screen order (increasing or decreasing with income), c) the order of the scenarios distinguished by whether multipliers are increasing or decreasing with income, and d) the order of presenting the causal information as 0%, 100%, and X% was (separately) randomised between participants.
4 Participants were paid £3.50 for the first session and £5 for the second. The average payment was £9.01 per hour over the two sessions.
Of the 402 participants who completed the first session, 21 (5.2%) did not complete the second session. We drop an additional 44 participants who incorrectly answered 3 or more of 5 comprehension questions after the tutorial in the first session. We test robustness to this exclusion (Appendix D.4). This leaves an analysis sample of 337 participants with complete data from both sessions. The sample is younger than the UK adult population but representative with respect to sex and ethnicity (see Appendix B Table B1).
The 337 participants made choices in 26 rounds across three treatments, with each round involving allocations to three individuals. This gives a total of 26,286 (= 337 × 26 × 3) observations. Participants were only shown information on income of the individuals in treatments B and C (16 rounds), and so there are 16,176 observations for income-related allocations. Analyses conducted only with Treatment A or Treatment B data use 10,110 (= 337 × 10 × 3) observations in each case. Table 1 summarises the data used for estimation.

Social Welfare Function
Participants take the role of a social decision maker (SDM) who is assumed to maximise social welfare (W ) that is an aggregation of health (h i ) (QALYs) of individuals (i) in a population of size N (= 3 in the experiment). We use participants' resource allocations, and how they respond to variation in multipliers, incomes, and causal information, to identify parameters of a weighted utilitarian social welfare function (SWF) (Fleming, 1952;Harsanyi, 1955;Vickrey, 1960), where 0 ≤ ω i ≤ 1 ∀i and N i=1 ω i = 1 are weights. The social utility function, U (.), is common across individuals and represents the social preference for health of the SDM (Wagstaff, 1991;Bleichrodt, 1997;Dolan, 1998;Bleichrodt, Doctor, and Stolk, 2005). It is assumed to be concave and could be linear. If it is strictly concave and the weights are constant (ω i = ω ∀i), then there is aversion to pure health inequality in the sense that a transfer of health from a healthier to a less healthy individual increases welfare (Wagstaff, 1991;Dolan, 1998).
We restrict attention to the iso-elastic function, U (h i ) = (h 1−ε i − 1)/ (1 − ε) for ε ≥ 0 and ε = 1, and U (h i ) = ln (h i ) for ε = 1 (Atkinson, 1970). Welfare can be measured by the equally distributed equivalent (EDE) level of health, (2) With constant weights, the parameter ε captures the trade-off the SDM is willing to make between maximising aggregate health and equalising the distribution of health. With ε > 0, there is willingness to forgo health maximisation in order to reduce inequality: aversion to pure health inequality. As ε increases, this inequality aversion intensifies and social welfare becomes more sensitive to the lowest levels of health. As ε → ∞, the SDM's preferences approach the Rawlsian maximin -only health improvements experienced by the least healthy raise welfare.
The weights can be a function of non-health characteristics that are possibly correlated with health and may even determine health (Wagstaff, 1991;Dolan and Tsuchiya, 2009;Makdissi and Yazbeck, 2016). 5 This allows the social value of an individual's health to depend on their non-health characteristics.
We use the experiment to ascertain whether the weights depend on income. 6 If they do, it may be because income is considered to be a causal determinant of health and this is judged to be an unfair source of health differences. Alternatively, priority may be given to the health of poorer people to compensate for their material disadvantage. The experiment is designed to distinguish between these two motivations for income-dependent weights. We do not restrict the weights to be decreasing in income. Some may prioritise the health of the economically better off due to a belief that the marginal social value of health is increasing in income. 7 5 We assume that health does not directly determine the weights. This distinguishes eq.(1) from the (nonlinear) rank-dependent QALY model (Bleichrodt, Diecidue, and Quiggin, 2004;Bleichrodt, Doctor, and Stolk, 2005). Our experiment set-up does not permit a formal test of eq.(1) with ω i = ω ∀i against the rank-dependent QALY model consisting of eq.(1) with ω i = υ(i/N )− υ((i − 1)/N ), υ() non-decreasing, and h i ≤ h i−1 ∀i. However, in Appendix E we show that a model consisting of concave U () and health-dependent weights does not fit the data substantially better than a restricted version that imposes constant weights.
6 The income-related distribution of health can be evaluated without considering the direct effect of the income distribution on welfare provided the SDM's evaluation of individual well-being is additively separable in health and income (Makdissi and Yazbeck, 2016). Conditional on this restriction, income-dependent weights allow welfare evaluation of interventions that impact health differently depending on income without affecting the income distribution. With our experimental setup, even with additive separability, it would not be possible to identify parameters of a SWF over well-being determined by both health and income. Such an extension might demand too much cognitive effort from participants.
We elicit preferences by asking participants to allocate resources (y i ) while revealing the consequences for the distribution of health, which are generated by the health production function, h i = p i × y i . 8 Maximisation of social welfare (h EDE ) subject to a binding budget constraint, N i=1 y i = m, gives the optimal allocations,

Equity Weights
Specification of the weights distinguishes aversion to pure health inequality -differences in the health anonymous individuals -from aversion to income-related health inequality and the latter from aversion to income-caused health inequality.

Constant Weights
If a SDM considers health to be the only characteristic that is relevant to the allocation of resources, then they will give all individuals equal weight:

Income-Dependent Weights
Let x i be income and order individuals from poorest to richest, x i−1 ≤ x i . We capture aversion to income-related health inequality by weights that are a function of income ranks, The assumption that income rank, and not income level, affects the social value of health is consistent with the predominant use of rank-dependent (concentration) in-dices to measure income-related health inequality (Wagstaff, Paci, and Van Doorslaer, 1991;O'Donnell et al., 2008). We specify the weights in a way that is standard for these indices (Donaldson and Weymark, 1980;Donaldson and Weymark, 1983;Yitzhaki, 1983;Wagstaff, 2002): where β > 0 reflects the direction and degree of health prioritisation by income. 9 With β = 1, the weights are constant -there is no aversion to income-related health inequality.
There can still be aversion to health inequality. But, if there is, the welfare loss generated by that inequality is not larger when part of it is related to income. With β > 1, the weights decrease monotonically as income rank increases -there is aversion to pro-rich health inequality. With 1 < β < 2, the weight-income rank function is negatively sloped and concave. At β = 2, it is the linear weighting function of the standard concentration index: ω B i (2) = (2N − 2i + 1) /N 2 (Wagstaff, Paci, and Van Doorslaer, 1991). With β > 2, the function is convex and relative weights on poorer individuals increase. As β → ∞, the weights approach zero for all but the poorest individual.
While measurement of income-related health inequality usually imposes weights that decline with income (Wagstaff, 2002;Bleichrodt and van Doorslaer, 2006;Erreygers, Clarke, and Van Ourti, 2012), eq.(5) can accommodate aversion to pro-poor inequality. With 0 < β < 1, the weights increase monotonically and convexly with income rank. Values closer to 0 give greater weight to the very rich. 10 9 We calculate the weight for individual i over the interval [r(x i−1 ), r(x i )] to account for the small-sample bias that arises when β = 2 (Erreygers, Clarke, and Van Ourti, 2012). Solving the definite integral simplifies eq.(5) to ω . 10 One could allow for an even broader array of social preferences with the tractable, yet flexible, beta density function. This would involve specifying, ω B i (α, β) = r(xi) r(xi−1) Γ(α+β) Γ(α)Γ(β) q α−1 (1 − q) β−1 dq, with Γ(.) representing the gamma function and α, β > 0. This weighting function collapses to eq.(5) when α = 1. In The model of social preferences given by eq.(2) and eq.(5) nests: a) an Atkinson SWF that allows for aversion to relative inequality in the univariate distribution of health but with no aversion to income-related health inequality (β = 1) and b) the SWF of Wagstaff's (2002) achievement index that allows for aversion to income-related health inequality through the extended concentration index but with no aversion to pure health inequality (ε = 0).

Causality-Dependent Weights
We allow the strength of aversion to income-related health inequality to depend on beliefs about the extent to which variation in income causes variation in health by generalising eq.(5) to where β 1 > 0 and β 2 > 0 capture aversion to income-related and income-caused health inequality, respectively, and λ ∈ [−1, 1] represents beliefs about causality. 11 The sign of λ indicates the perceived direction of any causal effect of income on health. The magnitude of this parameter is the perceived proportion of health differences caused by income differences.
If the SDM believes that none of the observed health inequality is caused by income differences, then λ = 0 and eq.(6) collapses to eq.(5) with β = β 1 . In that case, while the SDM may be averse to income-caused health inequality (β 2 = 1), this would not affect their resource allocations because they believe there is no such inequality. If such a SDM favours either poorer or richer individuals when allocating resources, this behaviour must be motivated by a concern about non-causal income-related health inequality, which will be reflected in the parameter β 1 .
its unrestricted form, it allows for concavely increasing weights with income, weights that center around the median income rank, and weights with positive or negative skewness. 11 Solving the definite integral, eq.(6) simplifies to ω C i (β 1 , β 2 , λ) If the SDM believes that income differences cause health differences, at least to some extent, then λ = 0 and β 2 plays a role in the determination of social welfare and in resource allocations. The weight (and allocations) to the income poor, which is determined by β 1 β λ 2 , will increase if either a) the SDM believes there is a positive causal effect of income on health (λ > 0) and is averse to the resulting pro-rich health inequality (β 2 > 1), or b) the SDM believes there is a negative causal effect of income on health (λ < 0) and likes the resulting pro-poor health inequality (β 2 < 1). The weight to the income poor will decrease if either c) income is believed to have a negative causal effect on health (λ < 0) and there is aversion to the resulting pro-poor inequality (β 2 > 1), or d) income is believed to have a positive causal effect on health (λ > 0) and there is preference for the resulting pro-rich inequality (β 2 < 1).
If the SDM is indifferent to income-caused health inequality (β 2 = 1), then beliefs about the direction and magnitude of that inequality (λ) do not affect the weights. 12 To separately identify β 1 and β 2 , we use induced random variation in beliefs about the extent to which income-related health inequality (in each direction) is caused by income differences. We assume that participants believe entirely the causal information (I) provided in Treatment C on the percentage of multiplier (potential health) differences between individuals that is caused by income differences (section 2.2.3). Under this assumption, we . This fixes λ at zero and at two positive and two negative values for each participant. We can then identify β 1 and β 2 from resource allocations made in these different cases.
12 See Appendix C for a summary of all cases.

Non-Consequentialism
Some participants may allocate resources without considering consequences for the distribution of health. We can accommodate such non-consequentialist ethics by substituting y i for h i as the argument of the iso-elastic social utility function in eq.(1) and solving for the EDE allocation of resources, 13 In this case, the optimal allocations are a function of the budget, the weights, and inequality aversion, but not the health production function parameters:

Absolute Invariance
Equation (2) captures willingness to sacrifice health maximisation for less relative health inequality. To accommodate SDMs who are concerned about absolute health inequality, we also consider the Kolm-Pollak family of SWFs (Pollak, 1971;Kolm, 1976), extended to allow aggregation of health to depend on income. We do this by specifying The allocation of resources that maximises this indicator of welfare is The non-consequentialist SDM who is concerned about absolute resource inequality would make an optimal allocation that is given by eq.(10) with multipliers (p i , p j ) set to 1.

Share-dependent Weights
While specifying weights as a function of income ranks is consistent with the predominance of rank-dependent measurement of income-related health inequality (Wagstaff, Paci, and Van Doorslaer, 1991;O'Donnell et al., 2008), it does not allow for the possibility a SDM pays attention to cardinal incomes in prioritising the health of individuals. To accommodate social preferences of this kind, we specify weights that are a function of income shares, x j , and normalize them to sum to 1: 14 With γ = 1, the weights are constant and there is no aversion to income-share-related health inequality. With γ > 1, the weights decline with increasing income share, reflecting aversion to pro-rich inequality. With γ < 1, the weights increase with income share.

Estimation
We estimate parameters that reflect aversion to pure health inequality (ε or θ) and incomerelated/caused health inequality (β, β 1 and β 2 , or γ) through weights specified by eq.(5), eq.(6), or eq.(11). Within each treatment, the decision problem varies across rounds with the budget (m), multipliers (p i ) and, for Treatments B and C, incomes (x i ) of individuals. We use this variation and a random behavioural model to estimate participant-specific parameters.
Orthogonality of the multipliers to the income ranks of individuals allows us to estimate the response of resource allocations to these ranks and so to identify the β parameter and incomedependent weights. Exogenous variation in causal information on differences in multipliers caused by income allows estimation of the response of allocations to this information, which identifies the β 1 and β 2 parameters, and so causality-dependent weights.
Estimation involves maximising the likelihood of observing the resource allocations a participant chooses under the assumption that these allocations are optimal on average but are subject to error. Depending on the specification of the SWF, the optimal resource allocations are given by eq.(3), eq.(8), or eq.(10). The observed resource shares,ỹ i = y i /m, are assumed to be drawn from the distribution of a random variable,Ỹ i , that equals the optimal resource share in expectation, We assume that the vector of observed resource shares allocated over the three individuals in each round and Γ() is the gamma function (Robson, 2021). 15 The α i parameters determine the shape of the distribution. Here, they capture the relative weight given to each individual.
15 The Dirichlet distribution is a (flexible) multinomial Beta distribution, bounded between 0 and 1.
From the properties of the Dirichlet distribution and the assumption that the actual resource shares are equal to the optimal shares in expectation, we have We assume that V ar , where σ > 0 is a precision parameter that reflects noise in the choices that generate the observed allocations. The larger is σ, the lower is the variance of each observed allocation for any given vector of optimal allocations. From the distributional assumption, we have It follows thatỹ * The preference parameters determine the optimal allocation of resource shares to individuals,ỹ * i , and together with the shape parameters, α i , these determine the observed resource sharesỹ i . For each participant, k, the estimated parameters are those that maximise the log-likelihood function defined over all the rounds t ∈ T of a treatment, 16 We also estimate population averaged parameters by pooling the data over all participants and all rounds within a treatment, defining the log-likelihood as the sum of the participantspecific contributions ( K k=1 LL k ), and estimating one set of parameters that capture the preferences and weights of a representative SDM.

Equity-Efficiency Trade-Off
Using data from Treatment A, Figure 2 plots cumulative density functions (CDFs) of resource shares (ỹ i ) and health shares (h i ) conditional on relative multipliers (p i = p i / p i ). 17 Wheñ p i = 1/3 for any one individual, the absolute multipliers are equal across the three individuals in a round (see Table A1) and there is no equity-efficiency trade-off. In that case, almost all participants share resources, and therefore health, equally (ỹ i =h i = 1/3). At other values ofp i , the resource share CDF has substantial density on either side of 1/3. Participants vary allocations in response to between-individual differences in the productivity of resources.
The direction and strength of the response varies between participants.

Figure 2: Distributions of resource and health shares by relative multipliers
Note: Empirical cumulative density functions of resource shares (ỹ i = y i / y i ) and health shares (h i = h i / h i ) to individuals distinguished by relative multipliers (p i = p i / p i ). Data from all participants and rounds in Treatment A (n=10,110).
Individuals withp i > 1/3 are more productive than average. At less than equal resource shares (ỹ i < 1/3), the CDFs for these individuals are to the left of the CDF for the average productivity individuals. This indicates inefficient allocations that do not maximise health.
The above average productivity individuals get less than an equal share of resources in more than three quarters (78.5%) of the allocations to them. In more than one third (34.6%) of allocations, resources are reduced to such an extent that the above average productivity individuals get an approximately equal share of health. In these cases, health maximisation is sacrificed to an extent sufficient to reach equality.
Asp i increases (further) above 1/3, the health share CDF shifts to the right at above equal shares. In almost two thirds of the respective cases, participants choose allocations that leave individuals who are more productive than average with better than average health.
In a substantial minority (17%) of the allocations to individuals who are more productive than average, they are given more resources. In these cases, efficiency is pursued despite the inequality it generates. But not to the extent of giving all resources to the most productive individual and so maximising aggregate health.
For individuals who are less productive than average (p i < 1/3), the resource share CDFs lie to the left of the CDF for average productivity (p i = 1/3) at less than equal shares. This indicates that some participants give less resources to less productive individuals. Very few give no resources to the less productive, which would be required to maximise aggregate health. Most are prepared to trade efficiency for less inequality by giving more resources to less productive individuals. As the relative multiplier falls further below (1/3), the resource share CDF shifts further to the right above equal shares. This priorisation of the less productive is, in many cases, insufficient to prevent them from ending up with less than a one-third share of health -the health share CDFs forp i < 1/3 have substantial density below a one-third share.
Pooled data regressions confirm that resource shares fall and health shares rise with increases in the relative multiplier (Appendix D.1). On average, participants compensate for lower productivity by allocating more resources, but not by enough to fully offset the productivity disadvantage. Participant-specific regressions reveal substantial heterogeneity (Appendix D.1). Approximately 14.8% prioritise efficiency by giving more resources, and therefore health, to individuals with higher multipliers. Around 6.2% of participants do not adjust resource allocations in response to the multiplier and so give more health to the more productive. Around a half (49.3%) sacrifice efficiency for less inequality by giving fewer resources to individuals with higher multipliers, while ensuring that these individuals end up with better than average health. Around 27.9% allocate resources to equalise health. 18

Prioritisation by Income
Using data from all choices in Treatment B, the left panel of Figure 3 shows the mean health shares for individuals ranked by income within each round. On average, the poorest individual receives the largest share (0.3511), while the richest gets the smallest share (0.3141).
The null of equal shares is rejected (p-value < 0.01). 19 The right panel of Figure 3 shows the distribution participant-level estimates of the difference between the health shares given to the poorest and the richest individuals. A little less than a quarter (23.4%) of participants give a significantly larger share of health to the poorest, which appears in the figure as a positive difference. These participants drive the difference in the mean shares in the left panel. A large majority (71.5%) of participants do not discriminate significantly by income. A minority (5.0%) of participants are pro-richthey give a significantly larger share of health to the richest individual.

Sensitivity to Causality
Using the data from Treatment C, Figure 4 shows CDFs of the health share to the poorest individual within each round stratified by values of the exogenously varying causal information and whether the poorest individual is the least (left) or most (right) productive.
With each panel, the lack of any substantial differences between the CDFs indicates that participants generally do no adjust allocations, and so health shares, in response to information about the causal effect of income on the productivity of resources. On average across all participants, increasing the percentage of the multiplier differences that participants are told is caused by income from 0% to 100% has no impact whatsoever on the mean health share to the poorest when that individual is the least productive and it raises the poorest individual's health share by 1.65% when it is the most productive (Appendix D.3 Table D3).
Around 45% of participants do not change the health share given to the poorest at all when the causal information is increased from 0% to 100% (Appendix D.3 Figure D2).  Table 2 shows parameter estimates obtained from data that are pooled over allocations made by all participants in all rounds within each treatment. Using data from Treatment A only and imposing constant weights, we obtainε = 1.391 with a 95% confidence interval well above 1. Hence, the representative SDM is willing to sacrifice health maximisation for less inequality. 20 The point estimates of ε are marginally and significantly larger when weights are allowed to depend on income (Treatment B) and its causal effect (Treatment C).

Pooled Estimates of Preference Parameters
The Treatment B estimate of β is significantly larger than 1, implying that, on average, participants put greater weight on the health of poorer individuals. The magnitude of the estimate implies that the representative SDM would give the poorest individual in a popula-20 The welfare loss generated by inequality is the difference between the mean and EDE health. For example, for three individuals with QALYs of 40, 60, and 80, the mean is 60, the EDE at ε = 1.391 is 56.78, and the welfare loss is 3.22 QALYs. Note: Estimates in each row are obtained from pooling data over all participants (n=337) and rounds within the respective treatment. Estimates maximise the total log-likelihood, k LL k , with LL k defined in eq.(15) and the parameters defined in eq.(2) for ε, eq.(5) for β, and eq.(6) for β 1 and β 2 . The total log-likelihoods are 3308.2, 3626.5, and 2231.3 for treatments A, B, and C, respectively. Number of observations is 10,110, 10,110, and 6,066 in treatments A, B, and C, respectively. In brackets are 95% bootstrap confidence intervals obtained with the percentile method.
tion of three a weight that is 21.5% larger than the weight given to the richest individual. 21 This indicates substantial preference for a pro-poor distribution of health and implies greater aversion to health inequality when health differences are positively associated with income differences. The same inferences can be made from the fact that the Treatment C point estimate of β 1 is also significantly greater than 1.
Treatment B estimates of the general model consisting of eq.(2) and eq.(5) imply rejection of both an Atkinson SWF that does not allow aversion to income-related health inequality (H 0 : β = 1) and Wagstaff's (2002) achievement index that does not allow aversion to pure health inequality (H 0 : ε = 0). Allowing the latter type of aversion gives the greater improvement in data fit. 22 Using Treatment C data, the point estimate of β 2 is very close to 1 and is not significantly different from this value. This implies that, on average, the weight given to poorer individuals is not dependent on whether the health-income association is causal. The representative SDM is not more averse to income-related health inequality when low income causes poor health. 23 Table 3 shows percentiles of the distribution of participant-specific estimates of each preference parameter. 24 It also shows estimates of the precision parameter, which are much larger than the respective pooled estimates, indicating that the participant-specific estimates are substantially more precise. This signals the importance of preference heterogeneity, which is evident for all parameters and is particularly marked for pure health inequality aversion.

Heterogeneous Estimates of Preference Parameters
The median estimates of ε imply greater aversion to pure health inequality than the respective pooled estimates. 25 Irrespective of the treatment data used, the median estimate is well above 1, indicating that a majority is substantially averse to inequality. The 10-90 percentile ranges and Figure 5, which plots the distribution ofε from Treatment A, show extensive heterogeneity. While there are no health maximisers (ε = 0), around one sixth (16.0%) of Treatment A participants have 0 <ε ≤ 0.9. These Efficiency Seekers have only weak aversion to pure health inequality. About a tenth (9.8%) have approximately Cobb-Douglas preferences (Dolan, 1998): 0.9 <ε < 1.1. More than two fifths (42.7%) display 23 To identify β 2 , we use the randomly generated casual information to fix λ in eq.(6). We confirm robustness to an alternative identification strategy that fixes λ with elicited beliefs. We asked each participant to express, on a [-100, 100] scale, the strength of their belief that income causally raises or lowers the multiplier (Appendix A.4 and Appendix G). After re-scaling to [-1,1], this provides an estimate of λ for each participant that can be used with the Treatment B data to identify β 2 . This gives a pooled estimate ofβ 2 = 1.001 [0.952-1.050]. The respective estimates of ε and β 1 are almost identical to the Treatment B estimates.
24 See Appendix D.5 for distributions of estimates and correlations between estimates: both within and between treatments.
25 For example, using data from Treatment A and the 3-person scenario from fn. 20, the welfare loss from inequality calculated with the median participant'sε is 7.12, which is more than twice the loss obtained using the pooled estimate of ε (3.22). Note: Estimates maximise participant-specific log-likelihoods, eq.(15), with parameters defined in eq.(2) for ε, eq.(5) for β, and eq.(6) for β 1 and β 2 . Within each panel (A, B, C), top row gives median estimates and next row gives 10 th and 90 th percentiles in the respective distribution of participant-specific estimates. Each distribution has 337 estimates -one for each participant. In Treatments A and B, each estimate is obtained from 30 data points (10 rounds × 3 allocations per round). In Treatment C, each estimate is from 18 data points (6 rounds × 3 allocations per round).
more strongly Prioritarian preferences (Parfit, 2000): 1.1 ≤ε < 15. A little less than a third (31.5%) exhibit preferences that approach Maximin:ε ≥ 15. 26 The median estimates given in Table 3 of the income weight parameters that reflect aversion to income-related inequality (β in Treatment B and β 1 in Treatment C) are slightly smaller than the respective pooled estimates (Table 2). This indicates that the median participant is a little less pro-poor than the representative SDM captured by the pooled estimates.
The 10-90 percentile ranges and the left panel of Figure 6, which plots the distribution of  For a majority of the sample, estimates of ε > 0 and β > 1 imply rejection of restrictions on the general model (eq.(2) and eq.(5)) that would give an Atkinson SWF with no aversion to income-related health inequality (β = 1) and an achievement index with no aversion to health inequality (ε = 0).
Prioritisation of the poor, or the rich, is constrained by aversion to pure health inequality.
When this aversion is stronger, an increase in pro-poor weights has a smaller impact on the optimal allocation of health to the poor. This is illustrated in the right panel of Figure 6, which uses Treatment B estimates to plot participants' optimal health shares to the poorest individual (h * p ) against theirβ. Symbols and colours distinguish between participant categories defined by (ε) as Efficiency Seeking, Cobb-Douglas, Prioritarian, and Maximin. We consider a case with constant multipliers to ensure that there is no equity-efficiency trade-off.
Hence, whenβ ≈ 1,h * p ≈ 1/3. That is, with income-neutral weights, the poorest individual optimally gets an equal share of health. Asβ increases above 1,h * p increases above 1/3, but Figure 6: Distribution of income weight parameter estimates,β Notes: The left panel shows the distribution of participant-specific maximum likelihood estimates of β (eq.(5)) obtained from Treatment B data. There are 337 estimates, with 30 data points used to obtain each estimate. The distribution is shown as both a histogram, with density normalised to 1, and an empirical cumulative density plot. Values ofβ > 3 censored at 3. Pro-poor isβ < 1. Pro-rich isβ > 1. The right panel plots the optimal health share to the poorest (assuming equal multipliers) against (β), with symbols for categories of health inequality aversion. Efficiency Seeking, Cobb-Douglas, Prioritarian, and Maximin correspond toε ≤ 0.9, 0.9 <ε < 1.1, 1.1 ≤ε < 15, andε ≥ 15, respectively.
clearly to a much greater extent for the Efficiency Seekers and Cobb-Douglas types. For the Maximin types, the optimal share hardly moves from 1/3 because their extreme aversion to pure health inequality constrains them from giving more health to the poor even when they are strongly inclined toward the poor. Efficiency Seekers are relatively unconcerned by pure health inequality, and since with equal multipliers there is no efficiency motivation to consider, more pro-poor income weights (higherβ) strongly increase the optimal allocation to the poor.
In Table 3, the median estimate of the causal income weight parameter β 2 is even closer to 1 than the pooled estimate in Table 2. Both estimates, consistent with the non-parametric analysis, indicate that, on average, aversion to income-related health inequality does not strengthen when low income is known to cause poor health. However, the 10-90 percentile range forβ 2 implies that there are participants who give causality-dependent income weights.
For a little less than a third (30.6%),β 2 > 1.05 (Appendix D.5, Figure D3). These participants would appear to place greater weight on the health of poorer individuals after learning that lower income causes worse health (Table C1). Such causal information would seem to lead almost two fifths (38.9%) of participants withβ 2 < 0.95 to revise the weights in the opposite direction. Against such interpretations based on point estimates, a restricted model that imposes β 2 = 1 fits the Treatment C data better than the unrestricted model using either pooled or heterogeneous estimates (Appendix E.2).
Participant-specific estimates of each of ε and σ are strongly and significantly correlated between the treatments (Appendix D.5 Table D5). This demonstration of within participant consistency across the three treatments lends face validity to the analysis. Further evidence of consistency is a strong, positive correlation between estimates of β from Treatment A and β 1 from Treatment C (Table D5). Interestingly, estimates of ε from Treatment A are positively and significantly correlated with estimates of β from Treatment B, while within Treatment B the correlation between the estimates of these two parameters is weaker and not significant. This supports the contention that there is confounding of the two types of health inequality aversion when they are not estimated simultaneously. Within each treatment, there is a positive correlation between ε and σ, which is partly due to low noise in the allocations of maximin types who always opt for an equal distribution of health. Table 4 shows pooled and heterogeneous parameter estimates and goodness-of-fit (GOF) statistics for our main specification (Atkinson SWF, eq.(2) and income-rank-dependent weights, eq.(5)) and for two alternative SWFs (Kolm,eq.(9) and non-consequentialist, eq.(7)) and weighting functions (income-share-dependent, eq.(11) and constant, eq.(4)). We obtain all estimates with Treatment B data and show medians of the heterogeneous estimates. Main is Atkinson SWF (eq.2) and income-rank-dependent weights (eq.5). Inequality aversion is ε in eq.(2) and eq. (7) for Main and Non-consequentialist, respectively, and θ in eq.(9) for Kolm. Income weight is β in eq.(5) for Main, Kolm, and Non-consequentialist, and is γ in eq.(11) for Share. Constant is eq.(4) weight function. M P L is what we call the Mean Proportional Likelihood. Define P L t = L t /(L t + L U t ), where L t is the likelihood in round t for the data and estimates and L U t is a likelihood for a uniform distribution draw. M P L = 1/T T t (P L t ). If M P L = 0.5, model fit to data is no better than the fit to uniform distribution draws. As M P L → 1, data fit improves. M SE = 1/T T t (ỹ i −ỹ * i ) 2 is the Mean Square Error. AIC = 2k − 2 ln(L) is Akaike Information Criterion, with k the number of parameters. GOF is increasing with MPL and decreasing with MSE and AIC.

Model Comparisons
The heterogeneous estimates give a much better fit to the data than the pooled estimates irrespective of the GOF measure and the specification (see also Appendix E). For both pooled and heterogeneous estimates, all GOF measures indicate that the Atkinson SWF is strongly preferred to the non-consequentialist SWF and is slightly preferred to the Kolm SWF (see also Appendix E). The estimates, particularly the pooled ones, indicate that aversion to resource inequality, which is captured by the estimate of ε with a non-consequentialist SWF, is weaker than the main specification estimate of health inequality aversion. Using the pooled estimates, we reject the null of no aversion to pure health inequality for both the Atkinson SWF (ε = 0) and the Kolm SWF (θ = 0). 27 Specification of the income weights as rank-dependent versus share-dependent does not affect the GOF as much as the specification of the SWF. However, allowing for some form of income-related weights improves the GOF compared with the restricted model that imposes constant weights (see also Appendix E). Specification of the SWF as Atkinson versus Kolm has little effect on the pooled and median estimates of the income-dependent weight parameter. In both cases, using the pooled estimates, we reject the null of constant weights (β = 1) in favour of weights that decrease with rising income (β > 1). 28 In sum, the evidence supports health consequentialism, pure health inequality aversion, and pro-poor income-dependent weights. The data are not definitively more consistent with aversion to relative or absolute health inequality and the evidence is not decisively in favour of rank-or share-dependent income weights. Our main specification of an Atkinson SWF with rank-dependent income weights fits the data at least as well as all others considered.

Illustrative Application
Our estimates of heterogeneous SWF parameters obtained from a UK representative sample can be used in policy evaluation. They can be applied to an estimated policy-specific distri-bution of health over individuals or groups, which may ordered by income, to simulate the distribution of support for the respective policy. 29 To illustrate this potential, we simulate a population of 100,000 individuals characterised by income and health. The annual income of each individual is a random draw from a (rescaled) beta distribution with the mean (£34,281) and standard deviation (£30,052) set to be broadly consistent with the respective values for the UK income distribution. Health (QALYs) is a positive and stochastic function of log income. 30 We evaluate two policies that have the same cost and do not change the distribution of income. Policy A produces more health for poorer individuals and so results in less pure health inequality and less income-related health inequality than Policy B, which gives a higher mean level of health. 31 The top panel of Table 5 shows, for each policy, the mean and standard deviation of health and its correlation with income. 32 Policy B is preferred by standard economic evaluation that only considers the impact on mean health.
The bottom panel shows EDE health using our estimates of the Atkinson SWF without and with income weights. In the first case, Policy B is still preferred if we use the pooled estimate. The health inequality aversion of the representative SDM is insufficient for the greater inequality generated by Policy B to outweigh the higher mean it achieves and so tilt the balance in favour of Policy A. However, the median of the heterogeneous EDE estimates is larger with A. For more than half (51.3%) of the representative sample, Policy A gives the larger EDE health and so this policy would be chosen under simple majority voting.
29 Appendix F explains how to access and use our estimates, at https://doi.org/10.17632/9vy6f6g5k3.1. If grouped (by income) data are used, then the weighting parameters within the SWF would be applied to the proportion of the population in each group. 30 We use x i ∼ Beta(1.1, 10.5)×360000 to generate the distribution of income and derive from it a baseline distribution of health by setting h i = 45 + 2log( . Policy A allocates proportionately less resources to individuals with higher income rank, while Policy B allocates proportionately more. The productivity of these resources in determining health is a positive function of log income.
32 Appendix F Figure F1 shows the simulated marginal and joint distributions of income and health for each policy. The second scenario presented in the bottom panel allows for aversion to income-related health inequality through income-rank-dependent weights. Using the pooled estimates of ε and β from Treatment B, we infer that the representative SDM would prefer Policy A.
Adding aversion to positive health-income correlation to even moderate pure health inequality aversion is sufficient to tilt the balance in favour of A for the representative SDM, despite the higher mean generated by B. Using the heterogeneous estimates, preference for Policy A is even more emphatic. It would be the choice of 70.9% of the representative sample.
To take account of policy impacts on pure health inequality, the pooled or median estimate of ε is all that is needed to add distributional sensitivity to standard economic evaluation. As the above example demonstrates, the consequence of this extension for the choice of policy can depend on whether a pooled or median estimate is used. Our approach allows examination of variation in support for a policy along the distribution of estimates.
When attention is paid to income-related health inequality aversion, the pooled estimates of ε and β remain sufficient to rank any set of health outcome distributions generated by alternative policies provided the preferences of a representative SDM are considered relevant.
When opting to use heterogeneous estimates, the medians of two parameters are not enough.
In that case, the analyst must use the entire joint distribution ofε andβ that we provide.

Discussion
Standard economic evaluation of healthcare pursues an objective -health maximisationthat is inconsistent with the social preferences we elicit from a representative sample of the UK population. On average, people are willing to sacrifice efficiency in health production for less inequality. They also prioritise the health of poorer individuals.
There is substantial heterogeneity in social preferences over the distribution of health. A pooled estimate understates the extent to which most people would sacrifice maximisation of aggregate health to reduce inequality. Our median estimate of aversion to pure health inequality (ε = 3.5), which is estimated simultaneously with income weights, is smaller than previous UK estimates that potentially confound this aversion with aversion to incomerelated health inequality (Dolan and Tsuchiya, 2011;Robson et al., 2017;McNamara et al., 2020). Our median estimate is larger than the median interval estimate (ε = 1.0−1.5) obtained from a representative sample in Ontario (Hurley, Mentzakis, and Walli-Attaei, 2020), althoughε > 3 for 48% of that sample. Our median estimate is also within the range of median estimates (2.24 <ε < 4.85) identified from a sample of Portuguese college students (Pinho and Botelho, 2018).
In addition to aversion to pure health inequality, we find that, on average, there is prioritisation of the health of poorer individuals. Consequently, aversion to pro-rich health inequality is greater than aversion to pure health inequality. However, both pooled and median estimates indicate only slightly larger weights on the health of poorer individuals (β slightly above 1). The weights are less pro-poor than those imposed by the standard concentration index measure of income-related health inequality (β = 2) (Wagstaff, Paci, and Van Doorslaer, 1991;O'Donnell et al., 2008). This appears somewhat inconsistent with the Ontario study that finds a median degree of aversion to income-related health inequality closer to that built into the concentration index (1.5 <β < 2) (Hurley, Mentzakis, and Walli-Attaei, 2020). Another study finds that, if anything, the degree of aversion implicit in the concentration index understates that of the median person in Sweden (2 <β < 3) (Hardardottir, Gerdtham, and Wengström, 2021). The discrepancy between our estimates and these others is consistent with our hypothesis that studies that impose a positive correlation between health and income and do not elicit income weights simultaneously with aversion to pure health inequality will obtain upwardly biased estimates of willingness to prioritise the health of poorer individuals. In these studies, elicited aversion to differences in health by income also reflects aversion to differences in health per se.
In our approach, income weights have less impact on the allocation of health resources when there is stronger aversion to pure health inequality. A social decision maker who is less tolerant of that inequality allocates more resources to the less healthy. Indirectly, this increases the allocation to poorer individuals when health and income are positively correlated, as typically they are. This reduces the need for and marginal effect of pro-poor weights. Effectively, aversion to pure health inequality substitutes for the weights in raising the socially preferred health of poorer individuals. This explains the discrepancy between our and other estimates of the income-weight parameter.
The income-weight and pure health inequality aversion parameters jointly determine aversion to income-related health inequality. To illustrate, consider the marginal rate of substitution (MRS) of a poor individual's health (h P ) for a rich individual's health (h R ) with social welfare given by eq. (2): The relative amount of QALYs a rich individual must gain in order to offset a reduction in the QALYs of a poor individual, such that social welfare is constant, depends not only on the relative income weights, ω P ω R , and so the parameter β in eq.(5), but also on the relative health inequality, h P h R , and the pure health inequality aversion parameter, ε. Table 6 shows the MRS for a two-person society and for configurations of social preferences and five distributions of health (QALYs). The top row shows the preferences of a health maximiser with no aversion to pure or income-related health inequality. In that case, the health of rich and poor individuals are always perfect 1:1 substitutes. The second row corresponds to the case in which there is no aversion to pure health inequality (ε = 0) and so aversion to income-related health inequality is entirely determined by the income weight parameter, which we set to the value imposed in the standard concentration and achievement indices (β = 2). In this case, the rich individual must always gain 3 times the number of QALYs to compensate for the poor individual's loss of QALYs, irrespective of their levels of health. Note: Two person society with health measured in QALYs. Marginal rate of substitution (MRS) calculated for various social preferences and health distributions from the equation given in text. "None" shows the MRS of a health maximiser. "Income only" shows MRS for social welfare given by eq.(2), with ε = 0, and income weights from eq.(5), with β = 2. "Pure only" shows MRS derived from eq.(2) with ε set to the median participant-specific estimate from Treatment A,ε 0.5 = 3.2, and constant weights, eq.(4). "Both" gives the sample median MRS derived from eq.(2) and eq.(5) using participant-specific estimatesε andβ from Treatment B.
In the third row, there is no explicit prioritisation of the poor person's health (β = 1). Aversion to income-related health inequality arises indirectly through aversion to pure health inequality (the median participant-specific estimate from Treatment A,ε 0.5 ) and any association between health and income. Even at the most extreme pro-rich health inequality considered, the MRS is less than three fifths of that implied by the achievement index scenario (ε = 0, β = 2). As pro-rich health inequality falls in magnitude and then turns to pro-poor inequality, the MRS diverges further from that of the achievement index case.
In the bottom row, aversion to income-related health inequality arises directly through non-constant income weights and aversion to pure health inequality. For this case, we show the median MRS obtained from the distributions ofε andβ estimated from Treatment B. At the most extreme pro-rich inequality, the MRS is very close to that implied by the achievement index scenario. This illustrates that despite our median estimate of the income weight parameter (β) being smaller than respective estimates obtained by others ( There is heterogeneity in the prioritisation of health by income. Our non-parametric and parametric analyses suggest that a quarter to a half of the UK population is pro-poor, while somewhere between a twentieth and a little less than a quarter is pro-rich. The preferences of the latter group are entirely inconsistent with the normative foundation of concentration and achievement indices (Wagstaff, 2002;Bleichrodt and van Doorslaer, 2006;Erreygers, Clarke, and Van Ourti, 2012). We are not the first to estimate that a sizeable proportion of a population would prioritise the health of richer people. Hardardottir, Gerdtham, and Wengström (2021) find that slightly more than one quarter of a representative Swedish sample displays a pro-rich bias, while Hurley, Mentzakis, and Walli-Attaei (2020) estimate that a little less than one fifth of Ontarians are pro rich. These preferences are consistent with the marginal utility of health increasing with income, which, in turn, is implied by positive dependence of the marginal utility of income (consumption) on health. There is some empirical support for the latter (Finkelstein, Luttmer, and Notowidigdo, 2013), although the evidence is mixed (De Nardi, French, and Jones, 2010). Some may choose to allocate more health resources to richer individuals because higher income is perceived to offer greater opportunity to get the most from good health. This would be consistent with maximisation of aggregate well-being defined over health and income, with positive interaction between these two arguments. Our set up does not allow for such interdependence.
Another limitation is that to keep the experiment task cognitively feasible for a general population sample, we used a linear health production function. This sharpens the tradeoff between efficiency and equity. It may also increase the estimated aversion to health inequality. Without diminishing marginal product, maximisation of aggregate health requires the allocation of all resources to the most productive individual, which would also maximise health inequality. If there were diminishing returns to health resources, less aggregate health would need to be sacrificed to satisfy preference for lower inequality. The design of an experiment that allows diminishing returns and yet remains feasible remains a challenge.
Our experimental manipulation of income-health causality did not deliver clear evidence that causality strengthens aversion to income-related health inequality. However, prioritisa-tion of the health of poorer individuals is associated with beliefs about causality. Additional analysis reported in Appendix G shows that participants tend to give the poorest individual a larger share of health when they believe that a larger fraction of that individual's low potential health is caused by low income. This is merely descriptive evidence because the beliefs, unlike the causal information given in Treatment C, were not randomly assigned.
Nonetheless, it is consistent with beliefs about causality conditioning aversion to incomerelated health inequality. The lack of strong support for this hypothesis from Treatment C could possibly be because participants perceive distributive justice through the lens of equality of opportunity (Roemer, 2002) and view high income, and any health advantage it bestows, as a just reward for effort exerted to increase income. Another possible explanation is that aversion to income-related health inequality arises from concern about deprivation in multiple dimensions of well-being irrespective of whether one dimension (income) has a causal effect on another (health).

Conclusion
Our novel experiment and estimation strategy make it possible to disentangle aversion to pure health inequality from aversion to income-related health inequality. The approach could be used to estimate aversion to health inequality related to any non-health characteristic. Our findings cast doubt on the normative principles that underpin standard practice in health economic evaluation and the measurement of income-related health inequality. Strengthening of the normative foundations of these health economics methods needs to take account of the substantial heterogeneity we reveal in social preferences over the distribution of health. This is feasible with the distributions of estimated social preference parameters we provide.

APPENDICES Appendix A Experiment Details Appendix A.1 Experiment Overview
An overview of the experiment is shown in Figure A1. The experiment was run over two sessions, with instructions, three treatments, a belief elicitation and two questionnaires. The order participants went through the experiment is indicated by the arrows. The median times the participants completed each section are shown in minutes, in the bottom right corners. All participants went through all sections. Figure A1: Experiment Overview

Appendix A.2 Instructions and Tutorial Script
The text for the instructions, tutorial and tutorial questions below are shown to all participants in Session 1, on screen within the experiment. The instructions give an overview of the experiment to come. The six stages of a tutorial explain how to use the on-screen interface; each of the scripts are followed by an interactive on-screen tutorial. Finally, five tutorial questions are presented to check and reinforce understanding.

Instructions
Welcome. Thank you for taking part today.
Please Read These Instructions Carefully.
You will be asked to make decisions which determine the health of hypothetical individuals in society.
You will be given a "Budget" that you must divide between these individuals. The Budget is the total amount of "Resources" available to spend.
Resources determine "Health". Health is the number of years a person lives, adjusted for illness or disability. For example, consider someone who reached the age of 70 without any illness or disability, who then lived for a further 10 years with an illness which reduced their quality of life to half of what it was before. That person might be said to have lived for the equivalent of 75 years in full health. For shorthand, we refer to this as "Health".

Giving more Resources to an individual increases their Health. The impact of Resources on
Health is determined by a number referred to as the "Multiplier". The higher the Multiplier, the higher the level of Health achieved from a given number of Resources.
On the screen, you will distribute Resources between three Individuals. You will do this a number of times. Each screen will show a different scenario. The choices you make on one screen will not affect the scenarios that follow.
There are no right or wrong answers. We are interested in the choices you make, whatever they are.
You will now go through a tutorial, which will explain how to use the computer interface and the exact nature of the experiment.
Please click Next to continue.

Tutorial 1
This tutorial will show you how to use the on-screen interface.
You will first get practice in giving Resources to only one individual, who is identified by initials (e.g. MR). Drag the horizontal slider at the bottom of the next screen to the right to give more Resources to the individual. The Resources you give to an individual are taken from the Budget, which is shown on the left of the screen. As you increase the Resources, the Remaining Budget will decrease.
You must always use all of the Budget, so that the Remaining Budget is zero. When there is only one individual, this means dragging the slider all the way to the right. Later you will have to distribute the Budget between individuals.
Press Next to try out the slider. When you are done, allocate all of the Budget (100) and press Next.

Tutorial 2
The Resources you give to an individual determines their "Health". Health is the number of years a person lives, adjusted for illness or disability. Health is equal to the Resources multiplied by a number we call the "Multiplier".
The Multiplier is shown in the table at the top of the screen. When you give Resources by moving the slider, the resulting Health is shown by the height of the grey bar. The number to the right of this bar is the amount of Health, and this is also shown in the table at the top of the screen.
For this first individual the Multiplier is 1. So, if you give all of the Budget of 100 to the individual, their Health will be 100. They will live 100 years in full health.
Next and see how Health changes as you adjust the Resources given to the individual. When you are done, allocate all of the Budget and press Next.

Tutorial 3
The Multipliers can vary from individual to individual. In the previous scenario, the Multiplier was 1. In the next scenario, it is 0.5.
Press Next and see how Health changes as you give more Resources to this individual. Notice that there is now a gap between the Resources given (blue bar) and the Health achieved (grey bar). If you give all the Budget of 100 to this individual, their Health will be 50. They will live 50 years in full health.
When you are done, allocate all of the Budget and press Next.

Tutorial 4
In each round of the experiment, there are three individuals. Individuals are identified by their initials (e.g. MR, TO and OD) and change between rounds.
On the next screen, there are three sliders at the bottom of the screen that you can use to give Resources to each individual and so determine their Health. used all of the Budget on the next screen, press Next.

Tutorial 5
The three individuals change from round to round.
On the previous screen, all three individuals had a Multiplier of 1. But the Multipliers can differ between individuals, as on the next screen.
Move the sliders to give Resources to the three individuals and notice how the Health achieved depends on the Multiplier of each individual. If you give the three individuals the same Resources, their Health will differ.
Take note of the size of the Budget, which can change from screen to screen.
Take note of the size of the Budget, which can change from screen to screen.
If you are having difficulty seeing both the table and the graph on your screen, zoom out on your web browser by holding "Ctrl" and pressing "-". Hold "Ctrl" and press "+" to zoom in.
Press Next and then give Resources to the three individuals. When you the Remaining Budget is zero, press Next.

Tutorial 6
The right of the screen shows further information.
"Resource Gap" is the gap between the largest and smallest amounts of Resources you give to the individuals.
"Total Health" is the total amount Health of the three individuals (e.g. Health to AC + Health to TD + Health to RC).
"Health Gap" is the gap between the largest and smallest amounts of Health achieved by the individuals.
If you have used the whole Budget, you will not be able to move any slider to the right.
If you want to give more Resources to one individual, you will need to give less to another individual first.
Press Next and then distribute Resources across the three individuals. When you have allocated all of the Budget, press Next.

Tutorial Questions
Following the tutorial, participants answered five questions to reinforce and check understanding. The questions are shown below, with the correct response in bold. After submitting answers participants were given feedback about the correct response for each question.
1. On each screen, you will give Resources to how many individuals? -Options: 2; 3; 4; Not Sure. In Session 2, a modified version of the above instructions and tutorial are shown. First, to remind participants of the experiment, and second, to highlight the additional information on the income of each individual. They are told that income is an individual's "annual personal income (before tax) in pounds", which is shown in the label for each individual. Table A1 shows the multipliers and relative multipliers used for Treatment A and Treatment B. There are 10 rounds within each treatment, the order is randomised between participants. Table A2 shows the multipliers and causal information shown to participants in Treatment C. The multipliers are orthogonal to the screen position, income rank and causal information.  Casual Information

Appendix A.3 Multipliers
Note: The randomly assigned incomes D, E, F and X, Y, Z are identical to those randomly drawn for the corresponding rounds (7 & 8) in Treatment B, either increasing or decreasing. Causal information shows the information provided on the percentage of the differences in the multipliers caused by income differences. 20-80% is a random draw from {20, 40, 60, 80}.

Appendix A.4 Belief Elicitation
Before participants begun Treatment C, three questions relating to their beliefs on the causal relationship between Income and Health (via the Multipliers) were asked. Question 1 asks whether, generally, participants believe that Multipliers would be Pro-Poor, Neutral or Pro-Rich. Question 2, split into 2a and 2b, gives two specific sets of Multipliers 1, 0.5, 0.33 and 0.33, 0.5 and 1 and asks the extent to which they believe the differences in Multipliers are caused by Income differences. The order of Pro-Poor, and Pro-Rich multipliers is randomised between participants, as is the screen order of poor to rich income individuals. For both questions interactive sliders are used. The script, alongside example screenshots, is below.

Question 1
The Health gained from an amount of Resources is determined by the value of the Multiplier.
The Multipliers can differ between individuals. Resources can improve the Health of some people a lot, while improving the Health of other people much less. These differences can arise by chance. They may also be caused by characteristics of individuals.
Consider three individuals with different incomes. If income did not affect the Multipliers, they might look like this.
Do you expect that an individual's income would also affect the size of the Multipliers?
If you would expect higher income to increase the Multipliers, then move the slider to the right. Observe that the richer the individual, the greater is the increase in the multiplier.
Move the slider further to the right if you expect income to cause larger differences in the Multipliers.
If you would expect lower income to increase the Multiplier, then move the slider to the left.
Observe that the poorer the individual, the greater is the increase in multiplier. Move the slider further to the left if you expect income to cause larger differences in the Multipliers.
If you would expect income to have no effect on the Multiplier, then leave the slider in the middle.
Once you have made you choice, please press Next.

Question 2a/b
Now, imagine that the Multipliers of the three individuals are as shown below.
The differences between the Multipliers may be partly caused by the differences in incomes.
However, the Multipliers may also differ by chance, or because of other characteristics of these individuals.
Move the slider below to show how much of the differences between the Multipliers you would expect to be caused by the differences in incomes.
Move the slider all the way to the left if you would expect NONE (0%) of the differences in the Multipliers to be caused by the differences in incomes.
Move the slider all the way to the right if you expect ALL (100%) of the differences in the Multipliers to be caused by the differences in incomes.
Move the slider to a point between these extremes to show the percentage of the differences in the Multipliers that you would expect to be caused by the differences in incomes.
Once you have made your choice, please press Next.

Appendix A.5 Treatment C
An example script and set of screenshots for Treatment C are shown below. Here, the participant is shown the Health allocations they made, in the round where the Multipliers were 0.33, 0.5 and 1, for Individuals who earn £5,000, £50,000 and £100,000, respectively.
They are asked to imagine that these income differences caused 20% of the differences in the Multipliers. With this new information they are asked if (and how) they would reallocate resources differently.
In that scenario you allocated resources and health as follows: Imagine that the differences in income caused 20% of the differences in the Multipliers. Any remainder of the differences were caused by chance and the other characteristics of these individuals.
Then, the Multipliers would be as below, where the light grey part of the bars represents the part of the Multiplier caused by income differences. If there is no light grey bar, this means that none of the differences are caused by differences in income.
Given this new information, would you change how you distributed the Resources between the three individuals?
Press Next and make any changes on the next screen.

Appendix A.6 Pilot Experiments
Prior to our main experiment two pilot experiments were run. The first, a laboratory experiment with a student sample (n=32) at Erasmus University Rotterdam; which was held over two sessions on the 28th of November 2019 at the ESE-econlab. The second, was an online experiment, with a UK adult sample (n=27) recruited via Prolific, which ran on the 25th of October 2021. As with the main experiment, the experimental interface was designed in R Shiny.
The first pilot was, primarily, run to test the experimental design. To test comprehension and allow for the improvement of the experimental design, survey questions were included to ask: 1) how difficult participants found the experiment, 2) what was most difficult, and 3) what could be improved. As a result of these comments, changes were made to the instructions, interactive tutorial, experimental display, error messages and the questionnaire. To test the experimental design parameters, participant-level preference parameters were estimated. Only minor changes were made to increase the variation in multipliers in Treatment A and B. More major changes were needed for Treatment C. First, the addition of the causal belief questions and, second, an overhaul of Treatment C to ensure the causal λ parameters were orthogonal to the individual multipliers, p i .
The second pilot aided the transition from a laboratory experiment, with a student sample, to an online experiment with a general population. The experiment was changed from a one one-hour session to two-30 minute sessions, the language in the tutorial was further simplified and an additional battery of demographic questions were added. Further changes were also needed to run the experiment through Prolific. Table B1 shows the characteristics of the 337 participants included in our main analysis.

Appendix B Descriptive Statistics
Extensive information from our questionnaire is included, relating to: demographic characteristics, socioeconomic status, health, COVID-19, economic preferences, views and beliefs.  Table B1.

Appendix C Weights, Causality Beliefs and Attitudes
Equation (6) allows for a range of beliefs and attitudes concerning income-caused health inequality. Table C1 provides a summary of the consequences of combinations of beliefs, represented by λ, and attitudes, represented by β 2 , for the weights given to the income-poor, which are determined, in part, by β 1 β λ 2 . λ > 0, λ < 0, and λ = 0 indicate beliefs that the causal effect of income on health is positive, negative, and zero, respectively. β 2 > 1, β 2 < 1, and β 2 = 1 indicate aversion, inclination, and indifference to income-caused health inequality. Table C1: Effect of causality beliefs and attitudes on weights to income-poor Decrease Unchanged Increase Note: Increase (Decrease) indicates cases in which allowing for causality in the income-health relationship increases (decreases) weights on the health of the income-poor. Unchanged indicates cases in which allowing for causality has no impact on the weights on the income-poor.

Appendix D.1 Response of Allocations to Multipliers
We pool the Treatment A data over participants (k) and rounds (t) and estimate the following regressions for relative multipliers on resource shares and health shares, respectively, where i indicates individuals and ν k and µ k are random effects at the participant level.
Since h ikt = p ikt y ikt , b y = 0 implies b h ≈ 1 and b h = 0 implies b y ≈ −1. When b y ≈ −1 and b h = 0, resources are allocated to fully offset productivity differences and keep health shares equal. Priority is given to reducing health inequality. As both b y and b h increase more resources and health are given to individuals with higher multipliers, which increases total health and health inequality. To explore heterogeneity in responses to the multiplier, we estimate participant-level regressions like eq.(D1) and eq.(D2) (without the random effects) using data from all rounds of Treatment A for each participant. 33 Figure D1 plots the distributions of the participantlevel estimates of b y and b h , with the respective 95% confidence intervals.  There is extensive and significant heterogeneity in responsiveness to the relative multipliers. For a substantial percentage of participants (≈ 27.9%), the null of b h = 0 is not rejected (at 5% significance level). These participants allocate health equally between individuals.
Since this is a relatively easy rule to follow, the confidence intervals are narrow for those with a point estimate close to b h = 0. For a small minority (≈ 6.2%) we do not reject that null of b y = 0 (and therefore b h = 1). These participants do not change the resource shares as the multiplier changes, and so they give more health to individuals for whom resources are more productive. Approximately 14.8% of the participants prioritise efficiency and so give more resources, and therefore health, to individuals with higher multipliers (i.e.
b h > 1). Around half (≈ 49.3%) of participants give less resources to individuals with higher multipliers, whilst still ensuring that these individuals end up with larger health shares (ie. 0 < b h < 1). These participants sacrifice some efficiency to get less inequality.

Appendix D.2 Response of Allocations to Income
We pool the Treatment B data over participants (k) and rounds (t) and estimate the following regressions for health shares,h where i indicates individuals, x ikt is individual income, and µ k are random effects at the participant level. In different regressions, we specify f (x ikt ) as income rank within round t, categories of income level, log income, and income share within round t.
The estimates in Table D2 show that regardless of how income is specified it has a large and significant effect on the resources allocated to an individual and, hence, the share of health they get. For example, using income rank, the richest individual, on average ends up with a health share that is 3.69 percentage points smaller than the share of the poorest individual. Participants are, on average, pro-poor. separate regressions for the cases in which the poorest individual is the a) least productive, p ikt < p jkt ∀j = i, and b) most productive, p ikt > p jkt ∀j = i. Table D3 presents estimates of these regressions. In rounds where the poorest individual is also the least productive (lowest multiplier), information about the causal impact of income on productivity has no effect on the share of health allocated to the poorest individual. When that individual is the most productive, there is a significant, albeit small, effect. Increasing the percentage of the variation in the multipliers that is explained by income differences from 0% to 100% is estimated to increase the share of health given to the poorest individual by 1.65%.  Figure D2 shows the distribution participant-specific estimates of regression model eq.(D4) (without the random effects) again stratified by whether the poorest individual is the least or most productive. Most participants do not respond to changes in the causal information -the null is not rejected (5% significance) for 88.1% and 90.8% of participants when the poorest individual is the least and most productive, respectively. When the poorest individual is most productive, the proportion of participants who give more health to the poorest when told that income causes a larger percentage of the difference in multipliers is greater than the proportion who reduce the health share of the poorest when given such information. This difference results in the positive significant effect estimated with the pooled data shown in Table D3.
where L t is the likelihood in round t for the data and estimates and L U t is a likelihood for a uniform distribution draw. M P L = 1/T T t (P L t ). If M P L = 0.5, model fit to data is no better than fit uniform distribution draws. As M P L → 1, data fit improves. M SE = 1/T T t (ỹ i −ỹ * i ) 2 is the Mean Square Error. GOF is increasing with M P L and decreasing with M SE. Robust standard errors in parentheses. p-values: * p < 0.10, * * p < 0.05, * * * p < 0.01. Inequality aversion is not significantly different for the excluded participants. However, precision is significantly lower for these participants and both GOF measures show significantly worse fit to the estimates obtained for them. This suggests that those who are excluded from the analysis because of apparent poor comprehension of the task indeed give estimates that imply that they have less precise optimal allocations and they make more errors through choices that deviate from those allocations. Figure D3 plots the empirical CDFs of participant-specific estimates of parameters obtained separately from Treatment A, B, and C data. The top-left panel reveals that Treatments A gives a larger proportion of lower estimates of ε, while Treatment C gives a larger proportion of higher estimates.

Appendix D.5 Distributions of Parameter Estimates
The top-right panel plots empirical CDFs ofβ from Treatment B andβ 1 from Treatment C. These parameters reflect aversion to income-related health inequality irrespective of its causal source. Estimates of β and β 1 may differ because they are obtained from different scenarios -multiplier and income combinations. Using Treatment B, 27.0% of participants have 0.95 ≤ β ≤ 1.05, and so, approximately, display no discrimination in favour of either poorer or richer individuals. With Treatment C, the respective percentage of income neutral participants with 0.95 ≤ β 1 ≤ 1.05 is lower, at 23.7%. The percentage displaying pro-poor weights is slightly larger for C (50.4%, β 1 > 1.05) than for B (49.3%, β > 1.05). The percentage displaying pro-rich weights is also larger for C (25.8%, β 1 < 0.95) than for B (23.7%, β < 0.95). The bottom-left panel shows CDFs of the precision parameter (σ) estimates. Treatment C appears to produce the most precise estimates (largestσ). This is partly an artefact of the smaller number of rounds in this treatment. Up to about the 40th percentile, Treatment B is more precise than Treatment A. But the latter treatment gives a larger proportion of very precise estimates. There are only 8.0%, 6.5%, and 3.0% of participants from A, B, and C, respectively, for whomσ < 10. Comparing these estimates with the respective pooled estimates of 8. 09, 8.71, and 8.99 confirms that allowing for heterogeneity across participants produces predicted allocations that are far more precisely centred around the optimal allocation. Table D5 shows rank correlation coefficients between participant-level preference parameters, estimated within the three treatments: A, B and C. Between treatments, there are strong and significant correlations for each of ε and σ for all treatments, and between β and β 1 , estimated in Treatment B and C. This shows consistency in the estimated preference parameters across different treatments. Within treatments, we do not see a significant correlation between ε and β in Treatment B, nor between ε and β 1 or β 1 and β 2 in Treatment C. We do see positive and significant correlations between ε and σ, this is in part explained by the low noise for maximin participants. We also see a positive correlation between ε in Treatment A and β in Treatment B, but not between ε and β in Treatment B.

Appendix E.3 Health-rank-dependent Weights
The model given by eq.(1), eq.(2), eq.(4) is subject to the criticism there is no distinction between inequality aversion and the concave valuation of health (Bleichrodt, Diecidue, and Quiggin, 2004). The parameter ε can be interpreted as reflecting either inequality aversion or diminishing marginal utility of health. The rank-dependent QALY model (Bleichrodt, Diecidue, and Quiggin, 2004;Bleichrodt, Doctor, and Stolk, 2005) makes this distinction.
In our experimental setup, participants choose optimal health levels, which determine health ranks. It is not possible to separately identify nonlinear aggregation of health (QALYs) and health-rank-dependent weights. A tractable alternative is to define the weights as a function of the fractional rank of the multipliers (p i ), such that the weights reflect the "potential" health rank of an individual. Using eq.(5), and replacing incomes by the productivity factors, with p i ≥ p i−1 , we specify the weights as: Using this alternative weighting function we can estimate preference parameters, pooling data from Treatment A across all participants and rounds. The estimates and goodnessof-fit statistics are shown in Table E2. The estimate of β p is smaller than 1, which would indicate that participants give lower weight to individuals with lower multipliers (productivity). Inclusion of β p appears to slightly increase the estimate of ε. However,β p is not significantly different from 1 (i.e. constant weights). Moreover, from the goodness-of-fit statistics there is little difference between the two models. Therefore, there is no strong support for a model with non-constant weights when individuals are not labelled by income. Note: ε reflects the (representative) SDM's aversion to health inequality; see eq.(2). β p determines rank multiplier weights in eq.(E1). σ is reflects precision of the SDM; see (eq.(13)). In brackets are 95% bootstrap confidence intervals obtained with the percentile method. MPL is Mean Proportional Likelihood, MSE is Mean Square Error and AIC is Akaike Information Criterion as defined in Table 4.

Appendix F Details of Simulated Policy Evaluation
We simulate a population of 100,000 individuals characterised by income and health. Annual income, x i , is drawn from a (rescaled) Beta distribution, where x i ∼ Beta(1.1, 10.5)×360000.
Baseline health (QALYs), h i , is a positive and stochastic function of log income, where h i = 45+2log(x i )+υ i , and υ i ∼ N (0, 3). We assume there are two policies, A and B, that have the same cost and do not change the distribution of income. We simulate the potential outcomes of each individual's health with Policy A and B, where h iA = h i + 0.75log(x i )(1 − r(x i )) for Policy A, and h iB = h i +0.75log(x i )r(x i ) for Policy B. Intuitively, we can think of Policy A as allocating proportionately less resources to individuals according to their income rank, whilst Policy B allocates proportionately more. The productivity of these resources, in determining health, is a positive function of log income. The marginal and joint simulated distributions of income and health are shown in Figure F1. These simulated distributions are intended to demonstrate the potential to use our estimates to evaluate and rank policies. The minimum data requirement is a distribution of health (QALYs) across individuals or groups for each policy. If data on incomes (or income ranks) of the individuals or groups are also available, then the evaluation can account for aversion to income-related health inequality as well as aversion to pure health inequality.
A data file containing the distributions of estimated parameters and code in R to conduct the analysis is available in an online repository, on Mendeley Data: https://doi.org/10.17632/9vy6f6g5k3.1. This includes an RData file, with the preference parameters estimated in this paper, alongside an R Script file, which provides a function to allow researchers and policy makers to evaluate other policies or interventions. All that is required to use our parameters to conduct a distributionally-sensitive policy evaluation is a dataset containing a treatment indicator, alongside health (and income) levels of individuals (or groups) with and without treatment. We provide example code to demonstrate how to use this function in R.

Appendix G Causal Beliefs
After exposure to Treatment B and before exposure to Treatment C, we elicited beliefs about a causal effect of income on the multiplier (Appendix A.4). Before doing so, we gave a reminder that the multiplier determines health gained from resources. We told participants that, in any round of the experiment, the multipliers could differ between the three individuals by chance and, possibly, because of their characteristics.
We asked participants whether they would expect income to affect the multiplier. They used a slider to answer on a scale from -100 indicating strong belief that lower income increases the multiplier to 100 indicating strong belief that higher income increases the multiplier, with 0 indicating that income has no effect on the multiplier. We refer to this variable as the General Causal Belief. Two participants did not complete the belief elicitation exercise and so the sample size is 335 for analyses using these data.
The left panel of Figure G1 shows the cumulative density of this General Causal Belief.
A median of 28 indicates weak belief that higher income causally increases the multiplier and so causes richer individuals to gain more health from a given allocation of resources.
Almost two thirds (64.8%) of participants believe in this pro-rich generation of health. Less than a quarter (22.7%) of participants believe the causal relationship is pro-poor. Around an eighth (12.5%) express a belief that income has no impact on the multiplier and so the generation of health.
After eliciting the General Causal Belief, participants were shown two scenarios with multipliers and incomes of three individuals as in rounds 7 and 8 of Treatments B and C.
In both scenarios, the multipliers are {0.33, 0.5, 1}. In one scenario, the multipliers increase monotonically with income across the three individuals (pro-rich). In the other, they decrease with income (pro-poor). In each case, we asked participants to use a slider, which was set to a random starting position between 0% and 100%, to indicate the percentage of the differences Figure G1: Distributions of Causal Beliefs Note: Left panel shows the empirical cumulative density function (CDF) of General Causal Belief -extent to which strongly believe that income causally increases the multiplier (larger positive value) or that income causally reduces the multiplier (larger negative value). Right panel shows CDFs of Pro-Rich Causal Belief and Pro-Poor Causal Belief, which are beliefs about the percentage of multiplier differences caused by a positive and negative income effect, respectively. n=335 for both panels.
between the multipliers that was caused by the income differences (Appendix A.4). We refer to these responses as Pro-Rich Causal Belief and Pro-Poor Causal Belief for the scenarios in which the multipliers increase and decrease with income, respectively.
The right panel of Figure G1 shows cumulative densities of the Pro-Rich Causal Belief and the Pro-Poor Causal Belief. In the former case, participants tend to believe that a higher percentage of the variation in the multipliers is caused by income. The median belief is that income causes 68% of the differences in multipliers when they are positively associated with income. The respective median belief is 59% when the multipliers are negatively associated with income. Together with the distribution of the General Causal Belief, these results indicate stronger belief in a positive causal effect of income on the multiplier.
To check the validity of the belief elicitation instruments and participants' comprehension of them, we regress the General Causal Belief on the Pro-Rich Causal Belief and the Pro-Poor Causal Belief. We re-scale the latter two beliefs to 0-1. Table G1 shows that the General Causal Belief is significantly positively associated with the Pro-Rich Causal Belief and significantly negatively associated with the Pro-Poor Causal Belief. This indicates consistency in responses between the questions. Participants who attribute a larger percentage of multiplier differences that are positively associated with income to a positive causal effect of income (larger Pro-Rich Causal Belief) report stronger belief that income causally increases the multiplier (positive and larger General Causal Belief). Participants who attribute a larger percentage of multiplier differences that are negatively associated with income to a negative causal effect of income (larger Pro-Poor Causal Belief) report stronger belief that income causally reduces the multiplier (negative and larger General Causal Belief). Although the partial associations and the R 2 statistic are small, the significance of the relationships between the elicited beliefs in the expected directions suggests that participants comprehended what they were being asked to do.  Generally, participants who believe that a larger percentage of the differences in the multipliers is caused by income allocate resources to ensure that the poorest individual gets a larger share of health. This is true irrespective of whether the distribution of the multipliers favours the rich or the poor, although differences occur at the bottom of the distribution in the first case and toward the top in the second.
Regressing the health share to the poorest on the causal belief variables reveals that the share has the strongest and most significant association with beliefs when the multipliers increase with income (pro-rich). In column (1) of Table G2 we use rounds of Treatment B in which the multipliers are {0.33, 0.5, 1} and increase monotonically with income. In this case, we estimate that a 100 point increase in the percentage of the difference in the multipliers believed to be caused by income is significantly associated with a 5.5 percentage point increase in the health share to the poorest. In column (2), we use rounds in which the multipliers decrease with income (pro-poor). The estimated association between the health share to the poorest and the causal belief is still positive, but it is smaller and not significant. In column (3), we pool all 10 rounds and regress the health share to the poorest on the Pro-Rich and Pro-Poor Causal Beliefs. This reveals that the health share to the poorest is positively and significantly associated with the Pro-Rich Causal Belief, but not significantly associated with the Pro-Poor Causal Belief. Although these beliefs are not exogenously manipulated, these results are suggestive that a stronger belief in a pro-rich causal relationship between income and health leads participants to prioritise the poor. Further research could try to disentangle the longer run connection between beliefs and social preference formation. 0.0194 0.0024 0.0076 * p < 0.10, * * p < 0.05, * * * p < 0.01 Notes: Each column gives estimates from OLS regression of health share to the poorest on Pro-Rich and/or Pro-Rich Causal Beliefs. Columns (1) and (2) use subsets of rounds with multipliers monotonically increasing (Pro-Rich) and decreasing (Pro-Poor), respectively, with income. Column (3) uses all 10 rounds. Robust standard errors (SE) in parentheses. p-values: * p < 0.10, * * p < 0.05, * * * p < 0.01.