The econometrics of happiness: Are we underestimating the returns to education and income?

This paper describes a fundamental and empirically conspicuous problem inherent to surveys of human feelings and opinions in which subjective responses are elicited on numerical scales. The paper also proposes a solution. The problem is a tendency by some individuals -- particularly those with low levels of education -- to simplify the response scale by considering only a subset of possible responses such as the lowest, middle, and highest. In principle, this ``focal value rounding'' (FVR) behavior renders invalid even the weak ordinality assumption often used in analysis of such data. With ``happiness'' or life satisfaction data as an example, descriptive methods and a multinomial logit model both show that the effect is large and that education and, to a lesser extent, income level are predictors of FVR behavior. A model simultaneously accounting for the underlying wellbeing and for the degree of FVR is able to estimate the latent subjective wellbeing, i.e.~the counterfactual full-scale responses for all respondents, the biases associated with traditional estimates, and the fraction of respondents who exhibit FVR. Addressing this problem helps to resolve a longstanding puzzle in the life satisfaction literature, namely that the returns to education, after adjusting for income, appear to be small or negative. Due to the same econometric problem, the marginal utility of income in a subjective wellbeing sense has been consistently underestimated.


Introduction
Now firmly entrenched in the economics literature, in national statistical agency data collection, and in the dialogue about progress and wellbeing, survey-based subjective evaluations of life 1 are the basis for estimating welfare benefits and costs of everything from inflation and unemployment, to air pollution and being married (e.g., Blanchflower et al., 2014;Levinson, 2012;Stutzer and Frey, 2006).Estimates of the psychological benefit of increased income, using this approach, are five decades old, and those evaluating the net individual return of additional education have been carried out for at least three decades.In terms of optimally allocating human resources, not much could be more central than knowing the marginal utility of income and of education.

Responding to life evaluation questions
However, the coherence and value of subjective evaluations of life rely on a series of considerable cognitive tasks to be performed in short order by the respondent.When asked, 2 "Overall, how satisfied are you with life as a whole these days, measured on a scale of 0 to 10?" a respondent must in some sense (i) conceive of the domains, expectations, aspirations or other criteria salient to her sense of experienced life quality or satisfaction; (ii) assemble evidence pertaining to each ideal, such as recent affective (emotional) states, significant events, and objective outcomes; (iii) appropriately weight and aggregate this evidence according to its importance to overall life quality, and (iv) project the result onto the discrete numerical scale specified in the question.This is without doubt a tall order, and any embrace of subjective wellbeing (SWB) data, and especially the headline measure of life satisfaction (LS), rests on their remarkable reproducibility and apparent cardinal comparability, possibly along with the principle that any objective indicator of experienced wellbeing must ultimately be accountable to a subjective one.While various studies have sought, with limited success, to find differences in interpretation of the LS question or norms of expression across cultures and languages (Helliwell et al., 2010;Exton et al., 2015;Lau et al., 2005;Clark et al., 2005), an important fact is that, uniformly across cultures, responding to the question is cognitively demanding.This paper focuses specifically on the consequences of an apparent heterogeneity across respondents in their ease with the final, quantitative step in the process outlined above.
The crux is that people with less facility with numbers may simplify the numerical response scale for themselves.In particular, the evidence below shows that some respondents restrict the set of numerical options under consideration to a three-point scale consisting of the bottom, middle, and top options, rather than the full set offered.This can be expected to introduce complex biases in mean life satisfaction and in estimated marginal effects on life satisfaction, in particular with respect to education and other correlates of numerical literacy itself.

INTRODUCTION
I present evidence of the prevalence and quantitative significance of this problem, with implications for the interpretation and analysis of all numerical, subjective response scales.The language and empirical examples all focus on the case of single-item SWB questions, mostly LS (Cheung and Lucas, 2014), which underlie the field of the "economics of happiness".While most empirical studies make use of a cardinal interpretation of the response scale in the life satisfaction question, and at least an ordinality assumption is universal,3 the "focal value rounding" (FVR) behavior, described above, introduces a conspicuous violation of the ordinality of response options.Because a number of governments are gearing up to carry out cost/benefit analyses using regressions of LS data for budgeting and program evaluation (Frijters et al., 2020;Frijters and Krekel, 2021;Happiness Research Institute, 2020;Grimes, 2021;Department of Finance, 2021;UK Treasury, 2021), proper econometric accounting for FVR may have practical importance.
In order to estimate the size of systematic biases associated with widely used methods of inference based on SWB reports, I present a model which accounts for the FVR phenomenon and which shows why biases on estimates can be large or small and positive or negative.The model also quantifies the fraction of respondents in a sample who have chosen an alternate, simplified response scale, a value I call the Focal Value Rounding Index, or FVRI.
The rest of this paper proceeds as follows.The remainder of the Introduction reviews some stylized facts related to education and wellbeing in the happiness literature, and mentions some points of history in the development of SWB survey questions like LS. Next, Section 2 will convince the reader that there is a measurement problem with quantitative, subjective scales like LS that is conspicuous, ubiquitous, and strongly correlated with educational attainment and that it has a natural explanation supported by the behavioral evidence.Then Section 3 presents the formal model in which a mixture of high-and low-numeracy respondents treat the response scale differently.Section 4 validates the estimation and identification approach using synthetic data and explores the complexity of biases that can result from FVR. Section 5 presents the main empirical estimates of the relationship between education, income, and wellbeing, using a large social survey from Canada.Section 6 reexamines three previously published studies, along with a ranking of U.S. states, as applications to investigate the extent of bias in existing published literature as well as in popular happiness rankings.In these empirical applications, previously anomalous but reproducible findings include evidence that a disadvantaged population reports high life satisfaction, and that the return to extra years of education after primary school are negative, especially when conditioned on income.These surprising findings are overturned when taking into account focal response behavior.A summary and a perspective on future directions are in Section 7.

Effects of education and income on subjective wellbeing
Education may be expected to confer welfare benefits not just through higher income but also through better health behaviors and enhanced social capital of various forms with intrinsic benefit (e.g., Helliwell and Putnam, 2007;Powdthavee et al., 2015), as well as through some kind of psychological capital which captures intrinsic benefits of learning or knowledge, or which complements other consumption (for instance, possibly literature, fine art, or the night sky).However, among the more surprising stylized facts in the economics of happiness is 1 INTRODUCTION that formal education does not help much to explain LS once income4 is accounted for (e.g., Layard, 2011;Frijters and Krekel, 2021). 5imilarly, although the literature on the importance of income and income growth on LS is enormous and involves a large potential role of consumption externalities (Barrington-Leigh, 2014), one may summarize the findings by saying that income has been found to be a weak predictor of LS in comparison to other, less market-mediated parts of life (e.g., Blanchflower  and Oswald, 2004; Hamilton et al., 2016; Layard, 2011; Frijters and Krekel, 2021). 6 This paper does not aim to explore all the reasons for this well-established evidence about quality of life from subjective response data.Instead, it characterizes a measurement error in which those with lower education and, as a proxy, those with lower income, may be more likely to under-utilize the LS response options in such a way that tends to bias their reported life satisfaction.In general, resulting biases on marginal effects could exist in either direction, but as described below they are more likely to be downward, meaning that they may go some way to explaining the education anomaly and to revise upward, if modestly, the estimated importance of income for supporting SWB.

Evolution of precision in subjective, quantitative reports
The history of survey questions on subjective assessments mirrors in part technological norms.Early innovators in monitoring SWB in social and household surveys tended to use a four point or five point scale, typically with Likert-style verbal response options.In such questions, the numbers were not meant as cues for the respondent.In some populations, most respondents chose one of the top two options, limiting the variation, or precision.As limitations of paper survey media have been erased by the adoption of computer aided interviews, the resolution of these subjective scales has expanded.However, with more than five or seven response options, verbal cues are typically not provided except for the highest and lowest response options.Responses instead become numerical.For instance, after many years of asking LS questions with a variety of scales, Statistics Canada settled over a decade ago on a particular wording with an 11 point scale. 7he OECD (2013) has also developed recommendations for standardizing the way such questions are asked by all national statistical agencies.The de facto standard for LS now is an 11-point scaling, from 0 to 10, with the lower extreme meaning, for example, "not at all satisfied", the upper signifying "completely satisfied", and the interpretation of the remaining values left up to the respondent.
An older literature sought to determine the optimal number of response options in survey questions with verbal cues for each option.For instance, it may be that in an oral interview, i.e., with no visual cues, four or five responses are the maximum that can be handled without confusion or overload (Bradburn et al., 2004).
When the scale is explicitly numeric, as with modern LS measures, there also arises a trade-off between the cognitive load imposed by a scale and the precision it allows.From the respondent's point of view, this trade-off is between the opportunity for self-expression and the cost of cognitive processing.The survey designer wishes to allow for precise responses in order to capture variability among respondents and over time, while not demanding too much.Overburdening would result, at best, in the respondent not fully optimizing her answer or not properly interpreting or using the given range of responses (OECD, 2013).Various studies on this balance have tended to favor 11-point quantitative scales over coarser option sets (e.g.7-point scales) as well as over nearly continuous options (Alwin, 1997;Kroh et al., 2006;Saris et al., 1998;OECD, 2013;Weng, 2004).8

Descriptive evidence
A small number of studies have remarked in some way on the use of focal values, but without a full account or explanation.9Dolan et al. (2011) mention that LS ratings in one study are positively associated with life circumstances as one would expect, except at the top of the scale, where "those rating their life satisfaction as 'ten out of ten' are older, have less income and less education than those whose life satisfaction is nine out of ten".They speculate a reason unrelated to cognitive limitations for this observation but declare that "This issue warrants further research".Conti and Pudney (2011) describe focal value behavior as a response to the existence of verbal cues, present on only three out of seven response options.Landua (1992) analyses response transition probabilities in the German Socio-Economic Panel, and Frick et al. (2006) confirm his report that respondents have a tendency to move away from the endpoints over time.In fact, this could be driven largely by the FVR behavior diminishing as panel participants, especially those with low numeracy, gain familiarity and comfort with the scale.

Educational attainment
Simply inspecting their LS distributions, stratified by education level, might have led these authors to the hypothesis developed in this paper.For illustrative purposes I appeal to one cycle from the Canadian Community Health Survey (CCHS), a large annual cross-section which in- 0 1 2 3 4 5 6 7 8 9  cludes an 11-point life satisfaction question as well as educational attainment.10Conditioning SWB responses on educational attainment reveals a striking feature (Figure 1).The relative frequencies of each focal value (0, 5, and 10) decrease with increasing education level.While the lowest education category shows four peaks, the distribution of responses in the highest education category features what would be a unimodal distribution around SWL=8, except for a slight enhancement at SWL=0.In addition to Figure 1, several other lines of evidence support the interpretation that scale simplification is a specific response to cognitive challenge, a model to be formalized in Section 3.

Difficulty responding
Another indication that the apparent tendency to simplify the response scale has to do with the difficulty of answering the question, as it is posed, comes from noticing that respondents with less education are more likely to refuse to answer the LS question at all.Appendix Table F.1 shows that response rates to the LS question, although close to 100%, are strictly increasing with educational attainment.

Unordered choice model
The existence of FVR behavior implies that SWB response scales cannot safely be assumed to be ordinal.For example, those with lower education may, all else equal, experience lower life satisfaction but be systematically inclined to report a higher value due to rounding up from a 3 or 4 to 5, or from 8 or 9 to 10.It is possible, therefore, that on average those reporting 9 could be happier than those reporting 10.Traditional methods used in econometric inference from LS-such as OLS, ordered logit, ordered probit, and related time series and instrumented analogues -leverage strong assumptions about the symmetry of effects of explanatory variables on each step of the response scale, as well as assuming cardinality or at least ordinality among response values.Those models are therefore not flexible enough to account for the heterogeneous influence of predictors like education on focal and non-focal response values. 12n alternative approach is to relax the ordinality assumption for response options, and model the probability of each response independently, subject only to the constraint that the probabilities add up to one.The multinomial (polytomous) logit model13 does this.
Figure 2 shows marginal effects of education and income on response probabilities of each of the 11 points in the LS scale, from a multinomial logit model using education, logarithmic income, age, and age 2 as predictors for the sample shown in Figure 1.Under an ordinality assumption, one might expect marginal effects to rise monotonically with response value, since a better circumstance like education or income should lead to an increase in the relative probability of response s + 1 as compared with response s.Indeed, other than the focal response values 0, 5, and 10, the marginal effect of one step higher educational attainment (for instance, graduating from high school) is weakly increasing in reported LS.By contrast, the effects on the focal value responses are, with high statistical significance, negative14 outliers far below what would be expected based on the pattern of adjacent values.They show that more education significantly reduces the probabilities of each focal value response.Multinomial logit estimations provide a diagnostic tool for detecting predictors of focal value behaviour.For educational attainment (quantified on a 1-4 scale) marginal effects show a monotonic pattern with increasing response value, except for the remarkable outliers at 0, 5, and 10.These indicate that education and, simultaneously but to a lesser degree, income are significant predictors of the tendency to use a simplified response scale.Error bars denote 95% confidence intervals.
However, the effect sizes are hard to interpret because they are averages over the entire sample.
For instance, the education coefficient for LS =10 is an average effect over high types, for whom higher education increases the chance of reporting 10, and low types, for whom higher education decreases that chance.In order to separate those effects, a more structured mixture model approach, described below, is required.

Precision and self-expression
As a final piece of empirical motivation for the modeling approach developed below, I note that when excess precision is offered in an SWB scale, respondents appear to make a costly effort to choose round numbers.Specifically, Appendix Figure F.2 shows the distribution of responses from a computer-based SWB survey question framed on a 0-10 scale but with an available resolution of 0.1.There are clearly favored responses at every integer and half-integer value.The response interface was a graphical slider which gave no preference for any particular values.Thus, the prevalence of rounded values indicates that extra effort in the form of fine manual control was exerted in order to leave the slider precisely on a half-or whole-integer value.This can be interpreted as evidence of effort to faithfully communicate a mental result, motivated by the drive for self-expression (Alwin, 1997;OECD, 2013).15 3 Cognitive mixture model Motivated by the evidence above, the enhanced use of focal values can be interpreted as an indication that respondents have simplified their cognitive task by coarsening the numerical scale.Because FVR behavior is inversely associated with education and math skills, I focus on "numeracy" as one major influence on scale choice.The two-type mixture model below is based on the assumption that the cognitive processes of respondents differ only in the execution of step (iv) described in the second paragraph of Section 1.That is, an internal representation of overall wellbeing exists in a similar way across the two groups, who subsequently project that assessment onto either the full scale (high numeracy respondents) or a subset consisting of the bottom, central, and top values (low numeracy respondents).
For each of the two types, latent wellbeing is mapped onto a discrete response scale as in a standard, i.e. canonical, ordered logit model.That is, given a continuous, latent subjective assessment S ⋆ modeled in terms of individual characteristics x as S ⋆ = x ′ β s +ε, the cumulative probability of discrete responses k is given by: where α k are a sequence of threshold values α H k separating the full set of observed responses {0, 1, . . ., 10}, or α L k for the focal subset {0, 5, 10}, and Φ(•) is the cumulative distribution function of the unexplained portion ε of S ⋆ .Use of the logistic distribution for Φ(•) makes this an ordered logit model.
The high and low alternative ordered logit outcomes are combined using a simple dichotomous logit model.If z is a vector of individual characteristics, possibly overlapping with x, which serve as a measure of numeracy, then There is no explicit consideration of costs and benefits to the respondent.16Together, Eq. ( 1) and Eq. ( 2) form a mixture model.The probability of observing response k is The model is similar to the ordinal-outcome "finite mixture model" of Boes and Winkelmann (2006) except that here the mixing probability is dependent on individual characteristics (see also Everitt and Merette, 1990;Everitt, 1988;Uebersax, 1999).A more detailed account of the model is presented in Appendix A..

Identification
Are the parameters in this model point-identified in principle? 17Identification is a challenge because the same predictors may be used to predict the latent numeracy variable and to predict the latent wellbeing variable.As a result, one might fear that more than one set of parameters could equally well explain observations for a given sample.Excluding the columns of z from x in Eq. ( 3) would overcome this problem.However, for an all-encompassing subjective outcome such as latent wellbeing, it is safer to assume that everything could be a determinant.More specifically, a particular interest motivating this study is to assess the bias on estimates of the wellbeing effect of education, and education is also the primary available predictor of numeracy.
With stronger assumptions, an alternative strategy to the mixture model may be able to identify parameters for latent SWB by avoiding FVR altogether.One approach would be through thin set identification, if respondents with some level of education were known never to exhibit FVR.One standard problem with this kind of identification is that it relies on an assumption of a uniform effect of a covariate across its support, as well as the absence of interaction effects with other covariates.By contrast, the mixture model approach of Eq. ( 3), which leverages the entire sample, has the advantage of generalizability to explicitly estimate interaction terms or other functional forms to allow for non-uniform effects.
Another approach would be through selection on the dependent variable; that is, by restricting the sample to the subset of high types who did not respond with a focal value.Because no "5"s are observed in this group, it would consist of two subsamples: those with observed s ∈ {1, 2, 3, 4} and those with s ∈ {6, 7, 8, 9}.In fact, if the symmetries required for this approach to be unbiased were believed, then one could likely estimate coefficients for latent wellbeing using binary models like logit and sample subsets of respondents who answered one of only two consecutive, non-focal response options.
Returning to Eq. ( 3), within each of the two ordered logit formulations nested in the model, identification of the set of parameters (with no constant term) is standard.This still leaves us with incomplete identification, in general, of the parameters on variables common to x and z.One can imagine extreme distributions of SWB, for instance all near 10, in which FVR only acts to convert 9s to 10s.In this case, discriminating between the effect of common variables on latent wellbeing or FVR would not be possible, especially if the sign of coefficients is not constrained.However, more typically, with a broader SWB distribution, FVR will be distinguishable from effects on latent wellbeing.That is, successful identification rests on having sufficient independent, explainable variance in latent SWB across low types in order that there is also variation in their observed response.For instance, if the latent wellbeing of low types is sufficiently spread out that they sometimes round down and sometimes round up, then the influence of education on SWB, controlling for other influences, is separately identified from the influence of education on the reporting function, i.e., on the likelihood of being a low type.Put differently, identification comes from the response of the observed distribution to changes in numeracy, driven by some variable, being different from the response of the observed 17 Point-identification, typically referred to simply as "identification", is also called frequentist identification and is a frequentist concept.In Bayesian estimation, as is used in the empirics to follow, parameters are assumed to have distributions, not point values.Using a Bayesian estimation method with a broad enough prior, alternate sets of values which account for the data are simply reflected in multimodal (or suitably broad) estimates of the parameters (Lewbel, 2019).distribution to changes in latent wellbeing, driven by the same variable.This is assured when there is nontrivial variation in the latent wellbeing of low types.This conceptual argument is best corroborated quantitatively through simulation, which demonstrates, in Section 4, that β S and β N are simultaneously recovered when estimating Eq. (3).

Focal Value Rounding Index
As shown below, net biases on some estimated moments and model coefficients may be zero due to offsetting effects, even when FVR behavior is prominent.Therefore, to express straightforwardly the magnitude of the numeracy problem in a sample of respondents, another estimated value is helpful.This is the Focal Value Rounding Index (FVR), which is an estimate of the fraction of the population who restrict their answer to a set of focal values -i.e., the estimated fraction of low types.This value is well identified whenever β N is.

Counterfactual SWL distribution
The mixture model provides a posterior estimate of the latent SWB distribution, i.e., that which would have been reported had respondents all used the full scale.This represents a "correction" to the reported distribution of SWB.This is a distribution of predicted, counterfactual, discrete responses on the 0-10 scale, not an estimate of the latent variable S ⋆ .The next section demonstrates through simulation that the model successfully recovers (identifies) this counterfactual distribution, along with the FVR, means, and coefficients.

Model validation
This section, supplemented by several appendices, reports on the use of simulated data to validate the computational approach18 and the model's ability to identify simultaneous influences of a predictor, like education, on FVR and on latent SWB.A large battery of simulations demonstrates the complexity and scope of possible biases, due to FVR, in conventional estimates of SWB means and of marginal effects.19

Synthetic data validation results
Simulated SWB data are generated by a model in which a scalar z partly determines numeracy through Eq. ( 2) while z and a second scalar, y, partly determine the latent wellbeing S ⋆ (thus represent education and y represents other variables, such as income, which are not direct measures of numeracy (do not cause FVR).In order to reveal the possible scope of biases for plausible distributions of SWB, a number of parameters of the synthetic data generation process were varied systematically.These include χ, β N , and two parameters determining the scale and offset of the cut points.20 Figure 3 shows one example of a simulated distribution of SWB.In (a), shaded bars represent simulated responses on a 0 to 10 scale.Unlike in real data, we are able to identify which respondents (among those giving a 0, 5, or 10) used FVR.This portion of responses, labeled "low type", are shaded pink.Also because the data are synthetic, we are able to construct the latent ("true") wellbeing levels and thus the counterfactual 0-10 responses which would have been given if everyone reported without FVR.This counterfactual distribution, including both low and high types, is shown split into two groups based on education level.Although the true wellbeing distribution of this sample is centered around 7.5, equidistant from the focal values of 5 and 10, there is a net negative bias of −0.08 in mean reported SWB.This is because the distribution of the lower educated component is generally closer to "5" than to "10".Thus, the amount of rounding up is less than the amount of rounding down.
Simulated biases in regression coefficients are obtained by estimating a traditional ordered logit model on the synthetic data, and comparing those estimates to the true values used in constructing the data, β z S = β y S = 1.In the case shown in Figure 3, these biases are also both negative, namely −14% and −23% respectively.21Simulations were carried out for a wide range of parameters, generating cases with both positive and negative biases on mean LS and on β z S much larger than in this example.Simulated biases on β y S , by contrast, tend to be negative. 22More generally, the bias on mean LS can be as large as ±2 points (see Appendix Proposition E.1 in Appendix E) and the bias on β z S may be even larger (Appendix Proposition E.1).In all cases, the true distribution, fraction of low-types, and effects of z and y on latent wellbeing are identified and correctly estimated by the FVR mixture model.As an example, Figure 3(b) shows estimated coefficients and cut points for the same case shown in (a).

Variance of SWL ("happiness inequality")
Although not a focus of this paper, it is also worth noting that variance of SWB, which has attracted interest as a measure of inequality (Goff et al., 2018;Hasegawa and Ueda, 2011;Stevenson and Wolfers, 2008), also suffers from bias due to FVR, as of course do other moments and other measures of dispersion.For a relatively narrow distribution of LS centred around 5, FVR behavior decreases the variance.For a wider distribution, focal values of 0 and 10 would become prominent, and the variance could be biased upwards instead.

Empirical estimates of FVR bias and FVR
With the above evidence of parameter identification from simulated data, the rest of this paper turns to empirical estimates.The distributions of LS for different levels of education, shown in Figure 1, indicate the significance of focal value rounding behavior in the CCHS sample.Using the mixture model, the role of education in supporting LS can be estimated, despite the strong relationship between education and the focal value bias.Columns (1) and (2) of Table 1 show the results of conventional, or "naive" estimates of the following simple individual-level cross-sectional OLS model for LS, as well as its ordered logit counterpart.Educational attainment is captured by a set of cumulative dummies, so that β h j is the impact of having completed education level j or higher.
The naive estimated coefficient on completing secondary education is near-zero or distinctly negative in the two estimates.The ordered logit coefficients predict that completion of high school reduces the odds of a higher LS by more than 7%, and that even a university education reduces those odds by nearly 4% as compared with someone who has less than a high school education.These values are economically large; using the simultaneously-estimated coefficient on log income, the former effect is estimated to be equivalent to a 13% reduction in income. 23 When constrained to disallow focal value behavior, the mixture model's estimate, shown in column (3) of the full model is estimated, a significantly positive value (∼0.06) is found for the LS benefit of completion of secondary school, and an additional 0.17 for those completing post-secondary.
The bias in a conventional estimate of the income coefficient is also large: the mixture model strongly rejects the naive estimated value of ∼0.53, in favor of a value of ∼0.62.This represents a 17% difference in the most studied value in happiness economics.Combining these coefficients implies that, after controlling for income, the true benefit of college completion, as compared with an otherwise-similar respondent without high school completion, is equivalent to an additional 45% of income.High school completion by itself confers a benefit equivalent to more than 10% of income, after controlling for differences in actual income.
The specification in Table 1 includes both education and income as predictors of FVR.Appendix Table F.2 shows that alternate models with only education in the FVR equation, or with additional controls, give highly consistent results.
Next to Table 1 are visualizations of several sets of distributions, showing the model's ability to predict observed response patterns while estimating the distribution of "underlying" or "true" SWB.

Applications
Hundreds of empirical papers estimating models of life satisfaction and other extended-Likertlike scales could be revisited in light of the significant possibility of biases identified above.Those focusing on effects of socioeconomic status, gender, and age, and those which particularly address populations with low levels of numeracy, especially invite reanalysis.Here I reproduce estimates from three papers to exemplify the important changes that may result from such analysis, and to show that the often-reproduced "paradox" of negative benefits to education may be largely resolved by the cognitive mixture model.
6.1 U.K.: Clark and Oswald (1996) The first of these papers, with over 1500 citations, is a relatively early contribution in the modern study of relative income concerns but also prominently points out the anomalously low estimated returns to wellbeing from education (Clark and Oswald, 1996).It was also recently cited as one of 11 studies in the major accumulated evidence on the life satisfaction benefits from additional education (Clark et al., 2019, see Annex 3a).In fact, the paper uses data from the British Household Panel Survey (BHPS) prior to its inclusion of LS, so it uses instead responses to 7-point satisfaction with pay and satisfaction with job questions.Clark and Oswald (1996) did not examine the distributions of these subjective response variables according to formal educational attainment. 24Doing so reveals dramatic FVR behavior which roughly diminishes with education (Figure 4).The distribution of satisfaction exhibit a blend of the bias features attributed to z and y in these simulations. 23The values in this paragraph are calculated as e −.075 −1 = −0.072≈ −7%; e −.075+.038−1 = −0.036≈ −4%; and e −.075/0.53− 1 = −0.13≈ −13%. 24The description from Clark and Oswald (1996) reads: "Table 5 contains two ordered probits, in each of which three dummies for educational attainment are included as well as a control for income.The dummies are for a college degree, advanced high school (A-level approximately), and intermediate high school (O-level approximately).The omitted category is for no or low qualifications.These four categories are for achieved paper certificates and not merely for years of schooling".with pay is wider and more central (i.e., near "4") than that of job satisfaction, and features unmistakable evidence of all three focal values (1, 4, and 7) for groups with lower academic certifications.For job satisfaction, the upper focal value is most obvious but all three are evident on inspection.If those with A-levels but no College are excluded, then the group means and the prevalence of each focal value all decrease monotonically with education.
Table 2 shows raw coefficients for model estimates of overall satisfaction with job.The first three models are conventional estimation approaches, including an ordered probit model, which nearly reproduces the published values25 and retained sample size (4730 in all my estimates) of the main estimate in Clark and Oswald (1996, Table 5). 26In ordered probit, OLS, and ordered logit models, academic attainment is strongly predictive of lower satisfaction after adjusting for log of household income.The implied effect is enormous.As compared with someone with primary education only, an advanced high school graduate (A-levels) is less satisfied with their job by as much as they would be with a 3-fold decrease in wage. 27As already shown in Figure 4, even the raw mean job satisfaction is decreasing across the first three education groups.Clark and Oswald (1996) speculate that their findings of low satisfaction of the higher educated may be related to a recent recession that particularly hit the middle class in the UK, but also cite several earlier studies which corroborate the negative or negligible benefits from education on job satisfaction.
Equally surprising in these results is the nil effect of income on job satisfaction.The 95% confidence interval for the coefficient of log income in column (3) is −0.10 to +0.13, with the upper limit implying that a doubling of income would increase the odds of a higher satisfaction response by less than 10%.
Column ( 4) simply shows that the cognitive mixture model reproduces an ordered logit estimate when focal value behavior is turned off, while the key result lies in Column (5).When focal value behavior is accounted for, the income coefficient increases to a confidently positive value, and the strongly negative coefficients on O-level and College completion are eliminated.Respondents who finished A-levels but stopped there for some reason, i.e., did not complete college, are still predicted to be less satisfied with their jobs, but the effect is half as large as in the naive model.Estimates of other coefficients remain statistically unchanged.Both formal education and reported income prove significant in predicting focal value behavior.The estimated fraction of respondents, overall, who restricted their answers to focal values is 28%.The model also estimates a significant bias in the mean reported job satisfaction, from a latent value of 5.3 which would have obtained had all respondents used the full scale, to the observed value of 5.5.The model estimates that the low-numeracy (FVR) respondents reported an average job satisfaction of 5.9, and that the high-numeracy respondents reported an average of 5.3.
Table 3 parallels Table 2 but relates to the other column in Clark and Oswald (1996)'s Table 5 -an estimate for satisfaction with pay rather than with the job overall.In this case, increased income is a strong predictor of satisfaction even in naive estimates.On the other hand, higher education again strongly predicts lower satisfaction, after adjusting for household income, in conventional models.This may make sense if the primary effect of education in this context is to set expectations about pay.In any case, for satisfaction with pay, the FVR mixture model corroborates the estimates of the naive ordered logit model.
How can the coefficient estimates remain relatively unchanged in the presence of such a high degree of FVR?While column (5) of Table 3 shows that income and higher education levels predict lower propensity for FVR, and that 31% of respondents used a simplified response scale for answering this question, the net effect of FVR on the estimated coefficients is small.This can be understood by considering the distribution of latent wellbeing values, with reference to the discussion in Section 4.1 and the Remark for Appendix Proposition E.2.For this sample, the number of respondents rounding up from 6 to 7 or from 3 to 4 is balanced by the number rounding down from 2 to 1 or from 5 to 4. 28 Appendix Figure F.3 shows the estimated distributions of responses which would have been given in the absence of any FVR (second row), for both job and pay satisfaction.All education levels exhibit broad distributions of latent pay satisfaction, and all carried out some degree of FVR.
28 As discussed earlier and as this example shows, there is no simple relationship between the extent of FVR and the size of net biases, due to the possibility of offsetting contributions to bias and the importance of detailed distributional features of the sample.It is also worth noting that the model is capable of accounting for a high fraction of extreme values (1s and 7s, here) as scale boundary effects rather than FVR.Indeed, it is also capable of accounting for a central peak (here, at 4) without appealing to the existence of any FVR.Instead, the model estimate suggests that respondents were simplifying the scale, and that the independently-estimated fractions of respondents who did so were the same (28% and 31%) for the two questions.

Conventional
Mixture More recently, Powdthavee et al. (2015) have shed some further light on the apparent negative or insignificant returns to education in life satisfaction regressions.They articulate a more considered causal model for the impact of educational attainment on overall life evaluations, taking into account several of the multiple non-monetary channels through which education is expected or known to affect life.In particular, they allow for mediating effects of education through health, marriage, child-rearing, and employment, in addition to income.They conclude that "education is likely to be positively related to overall life satisfaction through many different channels, even when ceteris paribus education itself has a negative and statistically significant relationship with overall life satisfaction".Thus, while identifying some positive indirect effects of education, their analysis does not account for the overall negative effect of education on life satisfaction.Here I do not integrate their panel data mediation pathways into the FVR model, which would go beyond the scope of this paper.Instead, I use one cycle (2010) of the HILDA survey (see Powdthavee et al., 2015) to test the same questions as above: how much of the negative overall association between education and life satisfaction is accounted for by FVR behaviour?and how biased is the income coefficient when FVR is ignored?
Figure 5 shows a familiar pattern in weighted life satisfaction response distributions when separated by education level.Here the focal value enhancements are more subtle, but anomalously high response fractions for 5 and 10 are noticeable at least in the lowest education group, and the proportions of each focal value decrease across education groups.
Table 4 shows the comparison in now-familiar form of the naive estimates of income and education effects on life satisfaction in Australia (columns 1, 2, and 3) with an estimate of the FVR model in column (4).In the FVR-aware model, the coefficient on income approximately doubles, jumping by 4 standard errors.The additional effect of college degree attainment after finishing high school becomes weakly positive, and the effect of high school graduation climbs by 5 standard errors.

First Nations and Métis in Canada
Next I pick on my own prior work by re-examining a paper which reported an anomalously low benefit of income for a sample of Indigenous (First Nations and Métis) peoples in Canada (Barrington-Leigh and Sloman, 2016).In addition to estimating a negative effect of income on life satisfaction, we found an average life satisfaction among Indigenous respondents that General GSS SWL = 7.62 ± 0.04 N=13k Satisfaction with life (SWL) Figure 6.Life satisfaction of Indigenous Canadians (left panels) and the whole population (right panels).The second row shows similar patterns in the nation-wide General Social Survey.While the Aboriginal ESC is a distinct sample from the General ESC, the panel labeled "Aboriginal GSS" is simply a subset of the full GSS sample.
was equivalent to that of the general population, despite the stark objective challenges faced by the former groups, including disproportionate levels of discrimination and socioeconomic disadvantage with respect to the rest of the Canadian population.Barrington-Leigh and Sloman (2016) suggested as a possible interpretation that total income is not well measured by the standard income question for this group, but remain "cautious and skeptical" about the data overall.
This case study relates to the importance of being able to use life satisfaction data across diverse cultural and economic circumstances.It also demonstrates the use of the mixture model on a small sample.The data come from two Canadian surveys: the national Equality, Security and Community survey (General ESC, N = 3725) and its follow-up small sample of on-(70%) and off-reserve (30%) First Nations and Métis peoples29 in the Canadian Prairies (Aboriginal ESC, N = 446).As can be seen in the first panel of Figure 6, an enhancement at LS =10 in the Aboriginal ESC sample makes it the modal response value and may go some way to explaining the high mean reported LS.Indeed, this is likely the first report of a LS distribution with such a dominant response at its top value.On the other hand, respondents also gave plenty of 7s, 8s, and 9s, each with higher frequency than LS =5.Below I use the cognitive mixture model to assess how much this distribution might be biased by FVR, and whether the anomalous estimates in Barrington-Leigh and Sloman (2016) are reversed.
The first column of Table 5 shows a conventional ordered logit estimate of 10-point life satisfaction of the Aboriginal sample.For consistency with Barrington-Leigh and Sloman (2016), the education variable is a more continuous variable than in the previous two applications, being measured on ten steps ranging from no primary school to a PhD or professional degree.Once again, and despite a sample size of only 446, a significantly negative coefficient on education shows that, after adjusting for income, those with higher education report lower life satisfaction.In addition, the coefficient on log household income is estimated to be most likely negative, with a 95% confidence interval between −0.40 and +0.08.
The second column reports the estimate of a cognitive mixture model.Education strongly predicts numeracy, i.e., use of the full response scale.Most importantly, the education anomaly in the earlier analysis is resolved when FVR is taken into account: the confidently negative education coefficient is replaced by a weakly positive point estimate with a 95% confidence interval between −.08 and +.17.The weaker anomaly of a negative income coefficient is also partly resolved; in its place is one centered closely on zero with similar precision.
In order to address the surprisingly high average life satisfaction reported by Indigenous respondents, I next use a pooled model to compare groups after controlling for income and education.Pooled estimates of the Canada-wide respondents and the First Nations/Métis sample are shown in Columns ( 3) and (4) of Table 5. Adjusting for income and education, the Aboriginal respondents report 0.30 higher life satisfaction than non-Aboriginal.Although the explanatory variables here are few and the model is simple, this positive boost is counterintuitive for the reasons described above.However, when FVR is accounted for (Column 4), this situation is reversed, with the Aboriginal respondents reporting a weakly lower life satisfaction than others with similar income and education.In this model, education, income, and Aboriginal status are all allowed to predict FVR behavior.Education and income positively predict lower propensity for FVR behavior, as expected, while Aboriginal status has the equivalent effect on FVR as a two-point reduction in educational attainment level, for instance from completing a technical or community college certification to completing only high school.
For the pooled sample, the mixture model estimates a 70% higher income coefficient and corrects the strongly negative education effect of the naive model estimate with a weakly positive one.

Ranking of U.S. states by happiness
The United States is somewhat exceptional in that there are no prominent domestic surveys assessing subjective wellbeing with more than a 4-point response, 30 with the exception of the Gallup Daily Poll, which poses the Cantril Ladder question on an 11-point scale.
In this section, I investigate the extent to which a ranking of states by average reported life evaluations is biased by focal-value response behavior.I find that state-level differences in educational attainment relate to state-level differences in FVR.Applying the cognitive mixture model to these data provides a counterfactual "latent" or "corrected" mean life evaluation for each state, allowing for a comparison between a naive ranking and a corrected ranking of states.
Ranking of happiness around the world garners considerable attention, with over one million visits and downloads of the World Happiness Report each year.Below, the USA case demonstrates that a bias in rankings occurs when mean responses are taken at face value.
Figure 7(a)'s horizontal axis shows the distribution of state mean responses to the Cantril Ladder framing of life evaluation 31 in the 2019 (final) wave of the Gallup Daily Poll.Counterintuitively, these means are uncorrelated with the fraction of respondents in each state who provided the answer "10" on the 0-10 scale (vertical axis).Figure 7(b) gives some suggestion 30 However, two international datasets, the World Values Survey and the Gallup World Poll, do so on 10 and 11 point scales, respectively.
31 See Appendix G.7 for the precise wording of the question.as to why.States with higher high school completion rates have lower tendency to answer "10".Figure 7(c) and (d) show an example of how much states can differ in terms of FVR.The weighted response distribution for Misssissippi, which has a high incidence of answer "10", is remarkably different from that of Washington DC,32 with the lowest incidence, even though their mean responses are similar.With this motivation, Appendix Table F.7 presents estimates of a version of the mixture model Appendix Equation A.1 explaining individual responses with x = z comprised of the logarithm of household income, along with a set of indicators for a five-level educational attainment question.As before, several parameters and posteriors of interest are: the fraction (FVR) of respondents estimated to be using a simplified focal value scale; a mean of the latent life evaluation which would have been observed had all respondents chosen to use the full scale; and coefficients for the effect of income and education levels on the underlying (latent)  .Distribution across states of bias in mean life evaluations (in units of the 0-10 scale) and in estimated effects (in odds ratios as a percentage) on life evaluations.For instance, an odds ratio of 95% in the log(income) plot means that the bias results in a 5% reduction in the probability of being one level higher on the 0-10 scale, all else equal, in response to a unit increase in log(income).. life evaluations.These values are estimated separately for each state and can be compared in Appendix Table F.7 to the naive model, equivalent to an ordered logit, in which focal value behavior is not acknowledged.Figure 8 presents the distributions of biases in mean life evaluation and effects of high school completion and family income on life evaluations, obtained by comparing the ordered logit and mixture models.It shows that the Cantril Ladder question is in most states estimated to elicit highly positively-biased responses.In other words, the effect of "rounding up" to 10 (or to 5) outweighs any rounding down to 5 (or to 0), and is large.In many cases, the raw mean report is 0.1-0.2higher than that inferred with the focal value correction, which is large given that the standard deviation of Cantril ladder means is 0.13 among states, and the standard deviation of individual responses nationally is only 1.89.This bias is larger for states with lower educational attainment.

Aboriginal sample
Figure 8 also shows that the distributions of biases in education effects and in income effects are both uniformly downwards at the state level.Reassuringly, the mixture-model estimated effects of educational attainment on wellbeing are overwhelmingly positive after the correction (Appendix Table F.7).
Lastly, Figure 9 presents state rankings for both the raw reported life evaluation and the estimated latent life evaluation.The overlapping estimate ranges reflect the typically imprecise nature of this kind of ranking, especially given the small sample size in some states (see Appendix Table F.6).There is also significant consistency (correlation 0.70) between the corrected and uncorrected rankings.Nevertheless, the shifts are considerable: more than a quarter of states shift by more than a quartile in the distribution (despite the overall correlation), 65% of states shift positions by 5 or more, and 37% shift by 10 or more.

Discussion and conclusion
The contributions of this paper are to (i) explain a prominent feature of many subjective scale response distributions as the result of respondents simplifying the scale; (ii) identify ed-6.4 6.5 6.6 6.7 6.8 6.9 7.0 6.4 6.5 6.6 6.7 6.8 6.9  ucation and other proxies of numeracy as predictors of this "focal value rounding" behavior; (iii) formulate a model and estimation strategy for predicting life satisfaction responses from individual and contextual circumstances which properly takes into account a mixture of reporting behavior used by respondents; (iv) explore theoretically the biases possible due to the effect; (v) provide a way to estimate the degree (FVR) of focal value rounding behavior; and (vi) demonstrate the application of the estimation method and its significant impact for four published studies and surveys.Clark and Oswald (1996) write "Counter to what neoclassical economic theory might lead one to expect, highly educated people appear less content.The effect is monotonic and welldefined".This contradiction with neoclassical economic theory has generally held up to subsequent analysis over two decades but is partly resolved with the model described here, which takes into account a conspicuous empirical feature of the subjective wellbeing response function.
Income effects have been a focus in the study of wellbeing in economics since the field's inception, and an enormous literature exists around the magnitude of the income coefficient (e.g., Easterlin, 1974;Deaton, 2008;Clark et al., 2008;Dolan et al., 2008;Easterlin, 1995Easterlin, , 2013;;Ferrer-i Carbonell, 2005;Kapteyn et al., 1978;Luttmer, 2005;Senik, 2005;Van Praag and Kapteyn, 1973).Almost every economic study of LS includes an estimate of the income effect, and typically other influences on life satisfaction are quantified in terms of their income "compensating differential", i.e., the ratio between a coefficient of interest and the coefficient on income.Thus, the large corrections estimated here for the income coefficient indicate that material supports are slightly more effective for raising human wellbeing, as compared with the other -especially social -dimensions of life, than the literature has shown so far.According to the simulations, some downward bias can also occur for those other coefficients, especially when those dimensions of life are correlated with education, but there is little empirical evidence for this in the estimates carried out in this paper.
One next step for research is to examine international and cultural patterns in response functions.Effects will differ across countries according to where the average LS level lies on the scale, and according to the income and education distribution.There may be additional international differences in the tendency to use focal values.Therefore, using the mixture model approach, both differences in education systems and more cultural drivers of FVR can be incorporated into international comparisons of LS.Flexibly modeling each possible LS response so as to allow for non-ordinal relationships between them, carried out here using multinomial logit, is a good starting point for detecting such response biases driven by cultural norms as well as numeracy.Despite the general evidence of good comparability of LS patterns across cultures (Helliwell et al., 2010), it may still be possible to identify response biases towards central values or away from "extreme" values.One natural extension of the model described in this paper is to allow for the inclination to round (FVR) to vary separately for each focal value, effectively creating a mixture of eight "types" in the case of three focal values.
A deeper analysis of panel data will also be important, through an extension to incorporate fixed effects into the model developed in this paper.Preliminary analysis of panel data with a 5-point scale for LS, treating values 1, 3, and 5 as focal values, shows that the probability of LS changing from the middle value is decreasing in education.Traditional 1st-differences approaches for panel fixed effects are invalid because, for instance, the dependence of the 3 → 4 transition is not the mirror of the 4 → 3 transition.
Another extension of the model used in this paper will be to incorporate instrumental variables.Fortunately, this is relatively straightforward in Bayesian estimation frameworks, in which a single-step estimation procedure for instrumental variables is natural, subject to the normal exclusion restrictions (Drèze, 1976;Kleibergen and Zivot, 2003).As a proof of principle and in light of the descriptive evidence, this paper focuses on the idea of numeracy and on education as a primary predictor of FVR.Understanding the role of secondary influences, such as other demographic variables, fatigue, the cost of time, or motivation with respect to the survey, may help to identify other biases or to design better surveys.
Survey and questionnaire interface design is a further topic of future work.While the present study carries out an ex post determination of how respondents have used a subjective numerical scale, it may make sense to give respondents this choice up front.An interactive survey interface could dynamically offer different degrees of precision or resolution in responses, thus accommodating variation in cognitive capacity and other differences in the confidence of respondents' answers.Open-ended graphical scales may be one means to accomplish this, but further research into ways to elicit a statement of precision from respondents would be valuable.The potential for creativity and innovation is high, given the increasing availability of technology during an interview.
Depending on one's perspective, the present findings on response behavior, happiness income coefficients, and mean response biases may be taken as a warning of how difficult it would be to realize the most ambitious implementations of LS as a guide to policy (Frijters et al., 2020;Frijters and Krekel, 2021;Barrington-Leigh and Escande, 2018;Barrington-Leigh, 2016;Happiness Research Institute, 2020;Barrington-Leigh, 2021;UK Treasury, 2021;MacLennan et al., 2021) or, conversely, as another reassuring example of the robustness of LS inference to potential flaws inherent in its cognitive complexity, and possibly even a defense of the rough magnitudes of estimated effects that have become so reproducible in study after study.I take away both of these messages.

Figure 2 .
Figure2.Multinomial logit estimate of individuals' probability of giving each possible response to the life satisfaction question (CCHS data).For educational attainment (quantified on a 1-4 scale) marginal effects show a monotonic pattern with increasing response value, except for the remarkable outliers at 0, 5, and 10.These indicate that education and, simultaneously but to a lesser degree, income are significant predictors of the tendency to use a simplified response scale.Error bars denote 95% confidence intervals.
Figure3.Example of synthetic data and validation.Panel (a) shows simulated and latent responses for one set of synthetic parameters.Overall in this example, 33% of respondents are low type.They reported an average SWB of 6.8, rather than their true average of 7.0, while high types reported an average of 7.7.Panel (b) shows that the FVR mixture model correctly recovers synthetic coefficients β S and cut points α H , and α L .Vertical dashed lines show the true (data generating process) values.Most cut points are precisely estimated, but the lowest ones are poorly constrained because there are few low responses for this particular set of synthetic parameters.

Figure 4 .
Figure 4. Distributions of satisfaction with job and with pay for different categories of educational attainment.

Figure 9 .
Figure 9. Observed and corrected U.S. state rankings.Error bars show 95% confidence intervals.States without at least one of each possible response to the SWB question are omitted.

Table 1
, reproduces the ordered logit values, as it should.However, when

Table 1 .
Estimates of life satisfaction in CCHS.The first two columns show conventional, "naive" estimates (as raw coefficients) of a model explaining life satisfaction with just education and household income.Column (3) shows estimates from a degenerate version of the mixture model constrained to exclude FVR behavior.Column (4) shows the unconstrained mixture model estimate, with significantly higher effects of education and income on life satisfaction.Histograms show response distributions split up (and colored) by education.The top plot is observed values, with the model's inferred overall latent distribution shown by a dashed line.The second and third show latent (or "corrected") and predicted responses.

Table 2 .
Estimates of job satisfaction in BHPS.Raw coefficients are shown.Education indicators identify mutually exclusive groups in comparison to those with less than O-levels.

Table 3 .
Estimates of satisfaction with pay in BHPS.Description as for Table2.

Table 4 .
Estimates of life satisfaction in the 2010 cycle of HILDA.

Table 5 .
Estimates of life satisfaction of Indigenous Canadians