Misreported Schooling, Multiple Measures and Returns to Educational Qualifications

We provide a number of contributions of policy, practical and methodological interest to the study of the returns to educational qualifications in the presence of misreporting. First, we provide the first reliable estimates of a highly policy relevant parameter for the UK, namely the return from attaining any academic qualification compared to leaving school at the minimum age without any formal qualification. Second, we provide the academic and policy community with estimates of the accuracy and misclassification patterns of commonly used types of data on educational attainment: administrative files, self-reported information close to the date of completion of the qualification, and recall information ten years after completion. We are in the unique position to assess the temporal patterns of misreporting errors across survey waves, and to decompose misreporting errors into a systematic component linked to individuals' persistent behaviour and into a transitory part reflecting random survey errors. Third, by using the unique nature of our data, we assess how the biases from measurement error and from omitted ability and family background variables interact in the estimation of returns. On the methodological front, we propose a semi-parametric estimation approach based on balancing scores and mixture models, in particular allowing for arbitrarily heterogeneous individual returns.


Introduction
Focusing on the returns to educational qualifications when attainment is potentially misreported, this paper offers a two-fold contribution. First, it provides reliable estimates of a highly policy relevant parameter for the UK, namely the return from attaining any academic qualification compared to leaving school at the minimum age without any formal qualification. Secondly, it estimates misclassification probabilities and patterns of misclassification, including the temporal correlations in misreporting by individuals across survey waves. These results are obtained by casting the identification and estimation problem in terms of a mixture model, and using a semi-parametric estimation approach.
The measurement of the return to education, that is of the individual wage gain from investing in more education, has become probably the most explored and prolific area in labour economics. 1 Two important and interrelated issues arise as to the measurement of education. A first question is whether we can summarize it in the single, homogeneous measure of years of schooling. Although particularly convenient, this "one-factor" model assumes that the returns increase linearly with each additional year, irrespective of the level and type of educational qualifications the years refer to. In the US, grades generally follow years, and it has long been argued that the returns to an additional year are reasonably homogeneous (see for example Card, 1999). In the UK and other European countries, however, there are alternative nationally-based routes leading to quite different educational qualifications, and the importance of distinguishing between different types of qualifications is widely accepted. Blundell, Dearden and Sianesi (2005b) highlight the potential shortcoming of the "onefactor" model when applied to the UK's educational system, in which individuals with the same number of years of schooling have quite different educational outcomes. Not only would this obfuscate the interpretation of the return to one additional year, but imposing equality of yearly returns across educational stages was found to be overly restrictive.
A second important issue as to the measurement of education -and the one object of this paper -is the possibility of errors in recorded education and its consequences on the estimated returns.
Misrecorded education could arise from data transcript errors, as well as from misreporting: survey respondents may either over-report their attainment, not know if the schooling they have had counts as a qualification or simply not remember. With the continuous years of schooling measure of education, standard results based on classical measurement error show that OLS estimates are downward biased, and that appropriate IV methods applied to the linear regression model provide consistent estimates.
Indeed, the trade-off between attenuation bias due to measurement error and upward bias due to omitted variables correlated with both schooling and wages (the so-called "ability bias") has been at the heart of the early studies on returns to years of schooling. The received wisdom has traditionally been that ability bias and measurement error bias largely cancel each other out (for a review see in particular Griliches, 1977, andCard, 1999; for a recent UK study see Bonjour et al., 2003).
With the categorical qualification-based measure of education, however, any measurement error in educational qualifications will vary with the true level of education. Individuals in the lowest category can never under-report their education and individuals in the top category cannot over-report, so that the assumption of classical measurement error cannot hold (see, for example, Aigner, 1973). In the presence of misclassification, OLS estimates are not necessarily downward biased, so that the cancelling out of the ability and measurement error biases cannot be expected to hold in general. Moreover, it is now well understood that the IV methodology cannot provide consistent estimates of the returns to qualifications (see, for example, Bound, Brown and Mathiowetz, 2001).
To date, empirical evidence on the importance of these issues is restricted to the US, where it was in fact shown that measurement error might play a non-negligible role (see the results in Kane, Rouse and Staiger, 1999, Black, Sanders and Taylor, 2003, and Lewbel, 2007. For the UK there are no estimates of the returns to educational qualifications that adequately correct for measurement error. 2 This is of great concern, in view of the stronger emphasis on returns to discrete levels of educational qualifications in the UK and given the widespread belief amongst UK researchers and policymakers that ability and measurement error biases still cancel out (Dearden, 1999, Dearden et al., 2002, and McIntosh, 2006.
A first possibility to overcome the bias induced by misreported educational qualifications is to derive bounds on the returns by making a priori assumptions on the misclassification probabilities (see, for example, Molinari, 2008). This approach only allows partial identification of returns. In previous work (Battistin and Sianesi, 2011) we suggest bounds that can be derived allowing for arbitrarily heterogeneous individual returns. The corresponding sensitivity analysis is easy to implement and can provide an often quite informative robustness check. The alternative approach is more demanding in terms of data requirements but, if feasible, allows point identification of the returns. An additional appealing feature is that it provides estimates of the extent of misclassification in the educational measures, which may often be of independent interest. What is needed is (at least) two categorical reports of educational qualifications for the same individuals, both potentially affected by reporting error but coming from independent sources (for the proof of non-parametric identification, see Mahajan, 2006, Lewbel, 2007, and Hu, 2008. Repeated measurements on educational qualifications are typically obtained by combining complementary datasets, for example exploiting administrative records and information self-reported by individuals. In this paper, we build on the latter approach and provide a number of new contributions of considerable policy and practical relevance, as well as of methodological interest. First, we provide the first reliable estimates of the returns to educational qualifications in the UK that allow for the possibility of misreported attainment. We focus on the highly policy-relevant return from attaining any academic qualification compared to leaving school at the minimum age of 16 without any formal qualification (the latter being akin to dropping out of high-school in the US). The institutional details and the literature review relevant to motivate our interest for this parameter is discussed at length in Section 4. We rely on detailed longitudinal data from the British National Child Development Survey (NCDS), which allows us to control for a large set of family background and school type variables, as well as for ability tests taken by individuals at early ages.
Second, using the unique nature of our data we identify the extent of misclassification in three different data sources on educational qualifications: administrative school files, self-reported information very close to the dates of completion of the qualification, and self-reported recall information ten years later. To this end, we combine multiple measurements self-reported by individuals in the NCDS with administrative data on qualifications coming from school records. Compared to the existing papers in the literature, the availability of multiple self-reported measurements introduces a certain degree of over-identification, which allows us to isolate the extent of misreporting in school files from that of individuals, while allowing for persistence in the propensity to misreport across self-reported measurements. Thus, our setup gives us the unique chance of assessing the temporal patterns of misreporting errors across survey instruments and of decomposing misreporting errors into a systematic component linked to individuals' persistent behaviour and into a transitory part reflecting survey errors that occur independently of individuals in each cross-section survey wave.
Third, exploiting the information available in the NCDS data, we explore how the biases from measurement error and from omitted variables interact in the estimation of returns to educational qualifications. We produce a simple calibration rule to allow policy makers to use nationally representative data sets such as the Labour Force Survey to estimate returns to qualifications. These data totally rely on self-reported qualifications and do not contain any information on individual ability and family background.
Finally, on the methodological front we propose a semi-parametric estimation approach based 4 on balancing scores and mixture models. As far as we are aware we are the first ones to cast the estimation problem in terms of a mixture model, which combined with the propensity score defines a semi-parametric procedure that allows for arbitrarily heterogeneous individual returns. Given that the misclassification problem can be stated in terms of finite mixtures with a known number of components, we find this approach particularly suited for the case at hand. We first show that all the quantities of interest are non-parametrically identified from the data through the availability of our repeated measurements on educational qualifications. The conditions required for this result are very general in nature, or at least are as restrictive as those commonly invoked in the relevant literature on misclassification. We then proceed with estimation, drawing from the statistical literature on finite mixtures to propose a flexible strategy based on Bayesian modelling. We maintain throughout a unified general framework for the study of the impact of misreported treatment status on the estimation of causal treatment effects (Mahajan, 2006, Lewbel, 2007, and Molinari, 2008, and Battistin and Sianesi, 2011, are the only examples we are aware of). Our estimation method is thus of far wider interest, since the same issues arise in any application looking at the effect of a potentially misrecorded binary or categorical variable, such as eligibility to policy schemes, participation in (multiple) government programmes or work-related training.
We report a number of findings of substantive importance. Our results suggest that individuals are appreciably less accurate than transcript files when they don't have any academic qualification, but that they are slightly more accurate than transcripts when they do in fact have academic qualifications.
In line with the scant evidence available from the US, we thus find that no source is uniformly better.
For individuals, over-reporting is by far the most important source of error. Under-reporting is more of a problem in transcript files. Notwithstanding their different underlying patterns of measurement error, transcript files and self-reported data appear to be remarkably similar in their overall reliability. This is especially so when information is collected close to the time of attainment of the educational qualification of interest. We estimate that the degree of accuracy in the reporting of educational qualifications in the NCDS is about 80% in both transcript files and self-reported data collected close to attainment of the qualification. This figure is 3 to 4% lower when educational attainment is recalled ten years later.
From estimating the share of individuals who consistently report correctly, over-report and underreport their educational qualification across survey waves of the NCDS, we find that figures from just one wave are not likely to reveal behaviour. Our results do however show that the bulk of correct classification can be attributed to some degree of persistency in the reporting of individuals across waves. We estimate that about 90% of measurement error in the NCDS is related to the behaviour of 5 individuals; the remaining error is not systematic, and depends on random survey errors. We further provide strong evidence of positive autocorrelation in self-reported measurements conditional on true educational attainment. This finding in itself invalidates setups that base identification on repeated measurements by the same individuals. A piece of interesting evidence on survey errors is the incidence of recall errors among those with the qualification, which we estimate at 7.7%.
We estimate the true return from achieving any academic qualification for those who do so as a 26.4% wage gain. This figure is statistically different from that obtained from raw data without adjusting for measurement error. When educational records (from schools or individuals) are obtained relatively close to the completion of the qualification of interest, we find that ignoring both ability and misreporing biases would lead to strongly upward-biased estimates of returns. The resulting calibration rule to get an LFS-style estimate close to the true return suggests to multiply the "raw" estimate by 0.8. By contrast, when the educational information recorded in the data has been collected after over 10 years since completion, the two biases do seem to cancel each other out, with LFS-style estimates of the average return to academic qualifications being indeed very close to the true return.
The remainder of the paper is organized as follows. In Section 2 we allow for the possibility of misclassification in the treatment status in the general evaluation framework, and discuss the resulting identification problem. Our estimation strategy for the case at hand is presented in Section 3. Section 4 discusses how information in the data will allow us to implement this strategy under fairly weak assumptions. It then presents the evidence on raw returns and on the agreement rates between our multiple measurements. Section 5 presents our empirical results on the extent and features of misclassification, as well as on the true educational returns. We also explore how the biases from misclassification and from omitted variables interact in the estimation of such a return. Section 6 concludes.

General formulation of the problem 2.1 Identification when the educational qualification is observed
In the potential-outcomes framework, interest lies in the causal impact of a given "treatment" on an outcome of interest Y . 3 To fix ideas, and with our application in mind, in the following let the treatment be the qualification of interest and let the outcome be individual (log) wages. Let Y 1 and Y 0 denote the potential wages from having and not having the qualification of interest, respectively. 4 Let D * be a binary indicator for the qualification of interest, which we will later allow to be potentially observed with error amongst individuals. The individual causal effect of (or return to) achieving the qualification is defined as the difference between the two potential outcomes, β ≡ Y 1 − Y 0 . The observed individual wage can then be written as Y = Y 0 + D * β. We are interested in recovering the average return for those individuals who have chosen to undertake the qualification of interest, that is the average effect of treatment on the treated (ATT): 5 In the absence of misreporting of D * , identification of the counterfactual term E Y 0 |D * [Y 0 |1] follows straightforwardly from the following two assumptions, which we will maintain throughout.
Assumption 1 (Unconfoundedness) Conditional on a set of observable variables X, the educational choice D * is independent of the two potential outcomes: For the plausibility of this assumption, which allows one to focus on the impact of measurement error in the reporting of D * , we draw on Blundell, Dearden and Sianesi (2005b), who find the set of regressors X available in our NCDS data to be rich enough to control for the endogeneity of educational choices. To give empirical content to Assumption 1, we also require the following condition on the support of the X variables: Assumption 2 (Common Support) Individuals with and without the qualification of interest can be found at all values of X, that is: where e * (x) is the propensity score.
Under these two assumptions one can perform any type of non-or semi-parametric estimation of the conditional expectation function in the non-participation group, , and then average it over the distribution of X in the participants' group (within the common support) to get the counterfactual term of interest. Conditions 1-2 together make the strong ignorability condition of Rosenbaum and Rubin (1983).

Misclassified educational qualification
When qualifications are misreported, either because individuals are left to self-report or because of transcript errors, the treatment information recorded in the data may differ from the actual status D * . With our application in mind, we assume to have two repeated measurements of educational qualifications self-reported by individuals at different points in time (D 1 S and D 2 S ), as well as transcript records on the same individuals coming from the schools (D T ). It is worth noting the the former two measurements need not be independent of each other, as most likely they may be correlated through unobservables that affect the propensity of individuals to misreport. More in general, neither of self-reported and transcript measurements needs to coincide with D * .
x] the percentage of truth tellers, or of individuals correctly classified in transcript files, amongst those actually holding the qualification of interest. The corresponding percentage amongst those without the qualification of interest is instead In the remainder of this paper, we will refer to these terms as probabilities Throughout our discussion we will assume that the misclassification error in either measure is nondifferential, that is conditional on a person's actual qualification and on other covariates, reporting errors are independent of wages (see Battistin and Sianesi, 2011, for a more detailed discussion of the implications of this assumption). This assumption is stated more formally in what follows.

Assumption 3 (Non-Differential Misclassification Given X) Any variables D S and D T which
proxy D * do not contain information to predict Y conditional on the true measure D * and X: As shown in Battistin and Sianesi (2011), even under Assumptions 1-3 causal inference drawn from any of the triples (Y, D 1 S , X), (Y, D 2 S , X) or (Y, D T , X) will in general be invalid for the ATT, with the magnitude of the bias depending on the extent of misclassification in each measurement. In what follows, we will maintain the assumption of independent sources of error between self-reported measurements and transcript files, conditional on the observables X. Assumption 4 (Independent Sources of Error Given X) The measurements D S and D T are conditionally independent given D * and X: The assumption implies that qualifications self-reported by individuals and transcript files from schools are correlated only through the true measurement D * and the observables X. This, together with Assumption 3, are assumptions widely adopted in the relevant literature. However, as pointed out by Hu (2008) and Battistin and Sianesi (2011), the conditioning on a large set of X's makes them weaker than those reviewed in Bound, Brown and Mathiowetz (2001).
The general identification problem induced by misclassification can be formalised as follows. Under Assumption 3 and Assumption 4, the distribution of observed wages conditional on X for the 2 × 2 × 2 groups defined by D 1 S ×D 2 S ×D T can be written as a mixture of two latent distributions: the distribution of wages in the presence of the qualification, i.e. Y 1 , and the distribution of wages in the absence of the qualification, i.e. Y 0 . The mixture is: where the equality follows from Assumption 1 and the probability: which corresponds, under Assumption 1, to the causal effect of having the qualification of interest for individuals with X = x. As the ATT is obtained by integrating ∆ * (x) with respect to f X|D * [x|1], and the latter term is identified from knowledge of the p(d S , d T , x)'s, 7 it follows that that the ATT is identified if the mixture in (1) is identified (see Battistin and Sianesi, 2011, for the exact characterisation 7 There is: of the relationship between the true ATT, the effect estimated using either misrecorded measure and the latter's misclassification probabilities).
In the next section we show that, for the case at hand, the information in the data is sufficient to retrieve non-parametrically both mixture weights and mixture components.

Identification in the presence of misclassification
With two reports, Kane, Rouse and Staiger (1999) and Black, Berger and Scott (2000) have developed a procedure to simultaneously estimate the returns to qualifications and the distribution of reporting error in each educational measure. Their approach moves from the specification of a parametric model.
The general problem of non-parametric identification in the case of two reports has been investigated, amongst others, by Mahajan (2006), Lewbel (2007) and Hu (2008). The returns to qualifications and the extent of misclassification are point identified by assuming that the two available measurements come from independent sources of information. This implies that the extent of misclassification must be independent across measurements, and qualifies one of these -provided additional conditions hold -as an instrument-like variable for the problem.
We build upon this idea to show that the components of the mixture in (1) are non-parametrically identified given the setup that we consider. Key to our identification result is Assumption 4. Although three measurements of educational qualifications are available in our data, one can always reduce the dimensionality problem by generating a new variableD which results from the combination of D 1 S and D 2 S , for example by consideringD ≡ D 1 In this case, the two new measurements (D, D T ) are sufficient to retrieve the returns to qualifications non-parametrically as in Mahajan (2006), Lewbel (2007) and Hu (2008). The availability of multiple self-reported measurements introduces a certain degree of over-identification, and allows one to isolate the extent of misreporting in school files from that of individuals while allowing for persistence in the propensity to misreport across self-reported measurements of educational qualifications. To the best of our knowledge, this is the first paper that looks into this problem.
The identification result builds upon the following additional assumptions, that closely match those exploited in the relevant literature (see, for example, Chen, Hong and Nekipelov 2011). The general idea behind identification is to use D T as a source of instrumental variation which, through Assumption 4, allows one to define a large enough number of moment conditions given the unknowns in which is identified using: if the p(dS, dT , x)'s are known. the mixture representation (1). The availability of multiple reports coming from the same individuals sets the stage for additional moment restrictions, that can be used to allow for correlation in selfreported measurements.

Assumption 5 (Relevance of Educational Qualifications Given X)
The causal effect of having the qualification of interest for individuals with X = x is such that: This assumption implies that the latent measurement D * is relevant for the policy parameter under consideration at all values X. Following the discussion in the previous section, the requirement is stated in terms of conditional expectations. However, as we show in Appendix A, it could be formulated in more general terms by considering features of the conditional distribution f Y |D * X [y|d * , x]. Intuitively, this assumption is required to disentangle the mixture distributions in (1) when estimation is carried out from raw data.
The next assumption requires that the measurement D T contains enough information on the true educational qualification D * given X or, more formally, that Chen, Hong and Nekipelov 2011). For the binary case considered in this paper, a sufficient condition for this to hold is the following.

Assumption 6 (Informational Content of the Transcript Measurement Given X) The extent of misclassification in the measurement
This assumption is typically invoked in the literature and is indeed very reasonable, as it implies that information from the school files is more accurate than pure guesses once X is corrected for.
Finally, a more technical assumption is needed to ensure identification, which is implied by a nonzero causal effect of the latent measurement D * on the survey response patterns D S given X (see Appendix A).

Assumption 7 (Relevance of Survey Instruments) For each value x on the support of
The general identification result can be summarized in the following theorem, for which the proof is given in Appendix A, and particularizes to the setup considered in our application previous results by Hu (2008).

Estimation
Having proved that information on (Y, D S , D T , X) ensures non-parametric identification of the ATT and features of the error distribution across measurements, we now describe the strategy employed in the empirical section to estimate the quantities of interest. Two key assumptions will be maintained throughout the estimation process. First, we will assume that the mixture components are normally distributed, and propose a method that estimates (1) directly via MCMC. Given that the misclassification problem can be stated in terms of finite mixtures with a known number of components, we find this approach particularly suited for the case at hand. Note also that this is in the spirit of the work by Heckman and Honore (1990), where it is shown that under normality it is possible to estimate the distribution of potential wages in the Roy model from a single cross-section of data (see also the discussion by Heckman, 2001). To reduce the dimensionality problem that results from having a large number of X's, we implement a semi-parametric estimator that makes use of the concept of balancing scores taken from the programme evaluation literature (see Battistin and Sianesi, 2011, for an application of the same idea). The second assumption we make is that the mixture weights are heterogeneous across individuals only through functions of the X's that can be estimated from raw data. The estimation procedure employed will be discussed in the remainder of this section.

The curse of dimensionality
The main problem that hampers estimation of the ATT is the curse of dimensionality arising from a large number of regressors in X. In this section we propose a method to reduce the dimensionality of the problem based on the properties of balancing scores (see Theorem 1 by Rosembaum and Rubin, 1983, andImbens, 2000). Let S(X) be a balancing score such that the distribution of X within cells defined by S(x) is independent of (D S , D T ): In what follows, we discuss under which conditions the mixture representation given X in (1) implies a mixture representation given S(X). The idea is most simply put across by assuming that the By using (2) and from the fact that X is finer than S(X) we can write: Using (1) and the fact that the p(d S , d T , x)'s do not vary with X, the term on the right-hand-side of the last expression can be written as: so that there is: where the last relationship again follows from X being finer than S(x). Accordingly, the distribution of wages conditional on S(X) = s for the group defined by all combinations of (D S , D T ) is again a mixture of two latent distributions. The components of this mixture are weighted means of the components in (1) taken over individuals with S(X) = s, with mixture weights given by p(d S , d T ).
Note that the same representation would hold if the p(d S , d T , x)'s were left to vary with X only The identification problem is similar to the one described in the previous section: if (3) can be recovered from raw data, then one could identify the extent of misreporting and, therefore, the ATT.
To make the definition of S(X) operational, let G be a multinomial variable identifying the 2×2×2 groups obtained from the cross tabulation of (D S , D T ). Define the propensity scores obtained from the multinomial regression of G on the X's as e g (x) ≡ f G|X [g|x]. Results in Imbens (2000)

Bayesian modelling and inference
In the previous section we have shown that, for the case at hand, the mixture representation holds conditionally on the e g (x)'s if these are the only factors driving heterogeneity of the p(d S , d T , x)'s. We now build on this assumption to estimate the mixture (3).
We will assume throughout that, within cells defined by S(x), mixture components are normally distributed with cell-specific parameters. This amounts to assuming log-normality of wages conditional on the balancing score: given the nature of the outcome variable, this appears to be a sound specification for the case at hand. Importantly, it can be shown that any finite mixture of univariate normal distributions is identifiable (see, for example, Everitt and Hand, 1981) and this has some implications that are discussed in what follows. 8 Under the hypothesis of no returns conditional on S(x) the two mixture components coincide, and thus the mixture representation is invalid. This is known as the problem of homogeneity, and is ruled out by Assumption 5. Note that testing homogeneity, that is testing no mixture against a mixture of two distributions, is a non-regular problem, in that the null hypothesis belongs to the boundary of the parameter space. However, using the results in Yakowitz and Spragins (1968), it follows that any non-degenerate finite mixture of normal distributions cannot itself be normal. It follows that, in our application, testing Assumption 5 under the maintained assumption of normal components and non-degenerate p(d S , d T , x)'s amounts to testing normality of the observed wage distributions. 9 The mixture in (3) is estimated through a MCMC procedure, which is fully documented in Appendix B and whose main features can be described as follows. 10 Let e(x) = [1, e 2 (x), . . . , e 8 (x)] ′ be the 8 × 1 vector containing the balancing scores. We set: for mixture components and mixture weights, respectively, where Φ(·) is the standard normal distribution function. The former equation defines the 8 × 1 vectors of parameters θ 0 and θ 1 , and the scalars σ 2 0 and σ 2 1 . The latter equation defines the 8 × 1 vector γ g for any combination D T × D 1 S × D 2 S . Overall, this specification defines 82 unknowns that fully characterise the mixture (3).
We specify a joint prior distribution for these parameters, and we use a Gibbs sampling algorithm to obtain 2, 000 realizations from their joint posterior distribution. The posterior distributions for the unknown quantities of the mixture representation (3) can easily be computed using these realizations. 8 Perhaps the most natural and intuitive way of addressing the identification problem for mixtures of parametric distributions is found in Yakowitz and Spragins (1968), who show that a necessary and sufficient condition for the mixture to be identifiable is that the mixture components be a linearly independent set over the field of real numbers. This condition is met for the case of mixtures of normal distributions. Using the result by Yakowitz and Spragins (1968), it follows that our estimation procedure could be extended to more general families of parametric distributions. 9 We implemented a simple test for this hypothesis through a suitably defined set of regressions. Within cells defined by the cross tabulation of the three measurements of educational attainment, we regressed logged wages on the balancing scores, and tested for the normality of residuals. The results of this procedure, which are available upon request, are overall against the normality of logged wages. 10 It is worth noting that the estimation results proved informationally similar to those obtained in a previous version of this paper, where maximum likelihood estimation via the EM algorithm was employed.
Knowledge of these quantities is in turn sufficient to obtain estimates of the probabilities of exact classification and of the ATT.

Data and educational qualifications of interest 4.1 Data
In this paper we only consider methods relying on Assumption 1, and we thus require very rich background information capturing all those factors that jointly determine the attainment of educational qualifications and wages. We use the uniquely rich data from the British National Child Development Survey (NCDS), a detailed longitudinal cohort study of all children born in a week in March 1958 which contains extensive and commonly administered ability tests at early ages (mathematics and reading ability at ages 7 and 11), accurately measured family background (parental education and social class) and school type variables. In fact, Blundell, Dearden and Sianesi (2005b) could not find evidence of remaining selection bias for the higher education versus anything less decision once controlling for the same variables we use in this paper. We thus invoke their conclusion in assuming that there are enough variables to be able to control directly for selection.
Our outcome is real gross hourly wages at age 33. As to educational attainment, of particular interest to our purposes is that cohort members were asked to report the qualifications they had obtained as of March 1981 not only in the 1981 Survey (at age 23), but also in the 1991 Survey (at age 33). 11 12 For each individual we thus have three measurements, which -as we argue in the next section -can all be taken as proxies of educational qualifications acquired by age 20. These are the measurements that we will consider to implement the strategy that was described in Section 3.
We focus on males, further restricting attention to those in work (and with wage information) in 1991 and for whom neither of the three educational measure is ever missing. 13 These criteria leave us with a final sample of 2, 716 observations, which is the same sample used by Battistin and Sianesi 11 After having been asked about qualifications obtained since March 1981, cohort members were asked to "help us check our records are complete" in two steps. First, they had to identify on a card all the qualifications they had obtained in their lives (including any they had just told the interviewer about), and subsequently they had to identify any of these that had been obtained before March 1981. 12 Similar details were collected from other institutions if pupils had taken such examinations elsewhere. Results were obtained for approximately 95% of those whose secondary school could be identified. 13 It is reassuring to note that the patterns that emerge from the following tables are the same irrespective of whether the sample is selected on the basis of non-missing educational information ever or non-missing wage information in 1991 (the latter obviously also restricting attention to those employed in 1991). (2011).

Educational qualifications of interest
Non-parametric identification of the misclassification probabilities requires access to at least two independent measurements of educational attainment (in the sense explained in Section 2.2). In the NCDS data, such measurements are offered by self-reported attainment and by the School Files, the latter however only recording academic qualifications and only those achieved by age 20 -that is Ordinary A highly policy relevant parameter, and the one we focus on in our application, is the return from attaining any academic qualification (that is, from acquiring at least O levels) compared to leaving school at the minimum age of 16 without any formal qualification. 16  14 In the British educational system, those students deciding to stay on past the minimum school leaving age of 16 can either continue along an academic route or else undertake a vocational qualification before entering the labour market. Until 1986, pupils choosing the former route could take O levels at 16 and then possibly move on to attain A levels at the end of secondary school at 18. A levels still represent the primary route into higher education. 15 In fact, there is a wide assortment of options ranging from job-specific, competence-based qualifications to more generic work-related qualifications, providing a blend of capabilities and competencies in the most disparate fields. 16 Although the British system is quite distinct from the one in the US, one could regard the no-qualifications group as akin to the group of high-school drop-outs. The "None" category also includes very low-level qualifications at NVQ level In such a context it is thus of great policy interest to estimate the returns to finishing school with O levels compared to leaving with no qualifications. Indeed, Blundell, Dearden and Sianesi (2005b) found a non-negligible return of 18% for those who did leave with O levels and of 13% for those who dropped out at 16 without any qualifications. Furthermore, the return to acquiring at least O levels compared to nothing captures all the channels in which the attainment of O levels at 16 can impact on wages later on in life, in particular the potential contribution that attaining O levels may give to the attainment of A levels and then of higher education.
Having defined the parameter of interest, it is important to highlight the condition that allows us to have repeated measurements of achievement at age 20 coming from both school records and NCDS survey reports. As O level attainment is recorded by the schools by the time the individuals were aged 20 while it is self-reported by individuals by the time they were aged 23, we need O level qualifications to be completed by age 20. The UK educational system is indeed such that O levels are obtained before age 20, with the official age being 16. 17 Table 1  i.e. of having obtained any academic qualification by age 20.

Evidence from the raw data
As in Blundell, Dearden and Sianesi (2005b), we find that while results change little in response to the method used to control for selection on observables, controlling for ability test scores at an early age and detailed family background measures is crucial, and significantly reduces the returns to a 15 to 28% wage gain depending on the educational measure used. As to the latter, it is indeed striking that in the more flexible models (fully interacted linear model and matching), using an educational measure rather than another gives rise to returns which exhibit the same magnitude of bias as from omitted controls. In particular, matching yields estimates of returns which range from as low as 14.2% (self-reported 13 years after attainment) to as high as 27.8% (self-reported 3 years after attainment), with returns estimated from school files falling in between (23.9%). For all three estimation methodsand irrespective of the set of control variables being used -the estimates using self-reported measures at different times, as well as those using transcript vs recall information are significantly different (99% level -see the right hand side panel of Table 1). Estimates arising from measures obtained close to completion (i.e. transcript and self-reports at age 23) are by contrast not statistically different.
To investigate such substantial differences in estimated returns according to the educational report being used, Table 2 presents cross tabulations between the three underlying measurements. We find that the percentage of the sample where the three measures all agree is 82%. In what follows, we will refer to this statistic as the "agreement rate". Despite this being quite high, there are still important differences between the information contained in the reports. Of particular interest for our results, the incidence of academic qualifications in the population is 58.8% according to transcript information, whilst according to self-reports it is considerably higher, around 65% in both interviews.
If we were to believe the school files, only 3.1% of those students who did achieve O-levels reported to have no academic qualifications at age 23. At age 33, when asked to recall the qualifications they had attained by age 23, individuals are observed to make more mistakes, with 8% of O-level achievers "forgetting" their attainment. Conversely, still taking the school files at face value, it appears that almost one fifth of those with no formal qualifications over-report their achievement when interviewed at age 23. As was the case with under-reporting, over-reporting behaviour seems to worsen when moving further away from the time the qualification was achieved. When relying on recall information, almost one fourth of individuals with no formal qualifications state to have some.
The highest agreement rates are observed between transcript files and self-reported information close to completion (an agreement rate of 90% and a kappa-statistic of 0.792 18 ), while the lowest are found between transcript information and self-reported information based on recall (an agreement rate of 85% and a kappa-statistic of 0.692). The degree of congruence in information provided by the same individual 10 years apart falls in the middle (an agreement rate of 88% and a kappa-statistic of 0.745). The kappa statistics show a degree of agreement that Landis and Koch (1977) would view as substantial (kappa between 0.61-0.80).
One can follow Mellow and Sider (1983) and perform a descriptive analysis of the determinants of concordance across indicators of educational attainment. In results not shown, we find that only a couple of measured characteristics seem to matter in predicting agreement rates. In particular, having a father whose social class is professional is associated with a higher probability of agreement between the two individual's self-reports, and consequently among all three reports. Higher ability as measured by mathematical test scores at 11 is associated with a higher probability of agreement between self-reported and school information, the link being particularly strong close to completion, but remaining significant 10 years on. This association also means that high ability individuals have a higher likelihood of the three measures being congruent. Finally, school-type variables were found to be associated with the degree of concordance, with some types of schools (secondary modern and comprehensive) being associated with lower overall agreement rates. Overall, observed characteristics are thus found to have a very low predictive power of the degree of concordance, this being particularly the case when trying to infer the likelihood that information from the school files close to completion agrees with self-reported information 10 years later (all the control variables jointly explain 3.9% of the variance). By contrast, observables matter more in modelling the probabillity that individuals and schools agree close to the attainment of the qualification of interest.
In conclusion, even though formal statistics like the kappa measure of interrater agreement may show that there is substantial agreement between educational measures, we have seen that remaining divergences in the resulting treatment indicators can lead to substantially and significantly different impact estimates -indeed of the same magnitude as not controlling for the rich set of variables available in the NCDS. Furthermore, taking the school files at face value, there appears to be much more overthan under-reporting, and reporting errors seem to get worse when individuals are asked to recall their qualifications. While it appears natural to take the school files as being closer to the "truth", this is however by no means an a priori correct assumption, and one which will be assessed empirically in the next section.

Results
This Section presents our empirical results on the extent and features of misclassification, as well as on the true return to academic qualifications, that is one which takes into account the misreporting uncovered in the data. We also explore how the biases from misclassification and from omitted variables interact in the estimation of such a return. We first define the quantities needed to characterize misreporting across survey and transcript measurements. To ease readability, the conditioning on observables X will be left implicit throughout. 19  For each measurement W , the probability of correct classification (equivalent to the event W = D * ) can be computed by averaging the two probabilities of exact classification:

Summary of the quantities retrieved
The extent of misclassification in the measurement W is defined as one minus this quantity. Estimates of these quantities will be presented in Table 3.
The availability of repeated measurements coming from the same individuals allows us to define more structural parameters that reveal the individuals' propensity to misreport across waves.
thus averaging probabilities that involve the survey response patterns. Similarly, one can define  Table 4.
The comparison between the percentage of truth tellers, on the one hand, and the percentage of correct classification in each survey wave, on the other hand, should reveal how much the latter results from behavioural attitudes of respondents or from survey errors.
Finally, we define the probability of recall errors from the event D * = 1, D 1 S = D * , D 2 S = 1 − D * , denoting individuals holding the qualification of interest who report so at age 23, but who don't recall having the qualification ten years later. The probability of this event can be computed as:

Characterising the extent of misclassification
The first three panels of Figure 1 Table 3 are simply the averages of these distributions.
Our results suggest that individuals are appreciably less accurate than transcripts when they don't have any academic qualification, and this is even more so when survey reports from the later 1991 wave are considered. Specifically, the bulk of the distributions on the left hand side column of Figure   1 increasingly shifts towards lower values as one moves down the three indicators (D T , D 1 S , D 2 S ). The averages reported in the second row of Table 3 summarise the extent of misclassification/over-reporting for individuals without academic qualifications as being 16% in the school files, but as high as 27% and 31% in the 1981 and 1991 surveys. Thus while the degree of accuracy of self-reported measurements seems to be between 11% to 15% lower when compared to transcript records, we find only a small effect of the time of reporting for individuals without the qualification of interest (i.e. the survey closer to completion is only 4% percentage points more accurate than the survey 10 years later).
On the other hand, it seems that individuals are slightly more accurate than transcripts when they do in fact have academic qualifications (see the right hand side column of Figure 1, and the first row of Table 3). Individuals with qualifications are between 3% to 7% more likely than schools to report correctly their attainment, again pointing to little, or no, survey wave effect.
In line with the little evidence available from the US, no source thus appears to be uniformly better.
For individuals, we find that over-reporting is by far the most important source of error and that both types of reporting error worsen over time. Under-reporting is more of a problem in transcript files, although the incidence of errors coming from under-and over-reporting is markedly more similar than when individuals are considered.
Notwithstanding their different underlying patterns of measurement error, the two types of data sources appear to be remarkably similar in their overall reliability, especially when the sources collect the information of interest close in time. Specifically, the extent of correct classification for school files is estimated at 80%, for the 1981 wave at 80.3% and for the 1991 wave at 76.5% (see the last row of Table 3). The numbers reported thus suggest that self-reported measurements close to completion are just as accurate as the administrative information coming from the schools. The degree of accuracy is however around 4% lower when the information is collected up to 10 years after the qualification was attained.
Using the misclassification probabilities, we recovered an estimate of the true incidence of academic qualifications in the population, namely f D * [1], of 64.1%. Interestingly, while being substantially higher than the incidence according to school files (58.8%), this estimate coincides with the incidence according to either self-reported educational measure (64.0% in the 1981 wave and 65.0% in the 1991 wave).
The availability of two repeated measurements of qualifications which were self-reported by the same individuals at two points in time gives us the unique chance of assessing the temporal patterns of misreporting errors across survey instruments and of decomposing misreporting errors into a systematic component linked to individuals' persistent behaviour and into a transitory part reflecting survey errors that occur independently of individual behaviour in each cross section survey wave. Table 4 offers important insights on the nature of these errors.
First, the proportion of consistent truth-tellers, that is of those individuals who correctly selfreport their educational attainment in both survey waves, is considerably higher amongst those who do have academic qualifications (76.9%) than amongst those who do not (63.1%). This is graphically corroborated by the corresponding distributions across individuals presented in the bottom panel of Our results further provide a formal test against the assumption that self-reported measurements in the 1981 and the 1991 surveys are conditionally independent given D * . This would amount to assuming conditionally independent errors in the two survey measurements, thus ruling out possible correlation that may arise, for example, from unobserved individual propensity to misreport. Under the assumption stated, the covariance between D 1 S and D 2 S , conditional on the true attainment D * , would be zero, meaning that the probability of consistent classification in Table 4 should be equal to the product of the probabilities of exact classification in the two waves in Table 3. The evidence that we find clearly points to a different pattern (for those with  In survey data asking for a positive trait, one would expect the share of consistent under-reporters to be much lower than the one of over-reporters. Indeed, at 11.2%, it is almost half the size. As was the case for over-reporting, focusing on one survey wave alone would overstate the amount of underreporting. Interestingly, once we again combine the cross-sectional and panel results, we find that the share of under-reporting errors accounted for by non-systematic survey errors is almost identical to the one that accounted for over-reporting errors (27 to 40%), giving us confidence that we have indeed isolated the true random error component that occurs independently of individual behaviour.
The last group, the "confused", are those whose attainment is correctly recorded in one wave, but misrecorded in the other. This group makes up 14% of the NCDS sample, with slightly more "confused" among the no-qualification group (17%) than among the qualification group (12%). The most interesting subgroup amongst the "confused" is the group affected by recall bias, whose share is given by f D S |D * [1, 0|1]. We estimated the incidence of recall errors among those with the qualification at 7.7%, and in the NCDS sample at 5%.

Returns to any academic qualification
With the misclassification probabilities in hand, we can then proceed to estimate the true ATT from achieving any academic qualification as outlined in Section 3. Throughout this section, the following notation will be employed. ∆ * F U LL and ∆ * LF S denote estimates that are adjusted for misclassification 20 Note also that this correlation cannot be explained by the observable characteristics X: the evidence discussed is against the assumption that D 1 S and D 2 S are conditional independent given D * and X, as there must be at least one value of X such that the latter assumption is violated. Figure 6 in Appendix C presents the conditional distributions f D 2 a|0, b, x], visualizing the strong correlation across self-reports in the two survey waves.

23
and employ either full set of controls available in the NCDS or the LFS-style variables. Similarly, estimates obtained from raw data without controlling for misclassification will be denoted by ∆ F U LL and ∆ LF S .
The most reliable estimate for the ATT (∆ * F U LL ) is a 26.4% wage gain from achieving at least Olevels, with a posterior standard deviation of 0.065. When we correct for misrecording but only rely on the smaller set of controls (∆ * LF S ), the estimated ATT is 37.8% with a posterior standard deviation of 0.043 (note that we use such limited set of variables both to estimate the misclassification probabilities and to then estimate the return). Taken together, these two results point to a 43% upward bias in estimated returns that do not fully control for selection into educational attainment.
To put these estimates in context, Table 5 displays the new results together with our OLS estimates from Table 1. In the following, we focus on the OLS estimates as the fully interacted regression model (FILM) did not provide evidence for heterogeneous returns. It follows that in the remainder of this section ∆ F U LL or ∆ LF S will refer to point estimates obtained through OLS regressions. In order to heuristically compare frequentist and Bayesian estimates, we constructed p-values using the asymptotic distribution of the OLS estimator, calculating the probability of values larger, in absolute terms, than ∆ * F U LL . This amounts to assuming that the latter is the true value of the ATT. To ease readability, in the table we simply refer to these numbers as p-values for the statistical difference between ∆ * F U LL and ∆ F U LL , or between ∆ * F U LL and ∆ LF S .

Estimating returns based on educational reports that were obtained relatively close to the attainment of the qualification of interest
Ignoring both omitted-ability bias and potential misclassification in recorded attainment close to completion (either in the school files or self-reported), we find a return to academic qualifications (∆ LF S ) of 33%. Correcting for selection bias using our rich set of observed background characteristics reduces the estimated ATT (∆ F U LL ) to 19%. The value ∆ * F U LL thus appears to be bound below from the estimate that controls only for selection bias and above from the LFS-style estimate that controls for neither source of bias. Both these estimates are significantly different from the true return and would provide a misleading picture of how much people with academic qualifications have gained by investing in education.
What can we say about the relative importance of omitted ability and measurement error biases, and about the possibilities that the two cancel out when the qualification is recorded close to its attainment? By comparing the true return (∆ * F U LL ) to the one ignoring both types of potential biases (∆ LF S ), we do not find any evidence of balancing biases; quite to the contrary, ignoring both biases leads to a sizeable upward bias in estimated returns of over one quarter (26%). This result is reassuringly consistent with the findings in Battistin and Sianesi (2011), who bound the ATT of interest semi-parametrically and find that ignoring both misreporting and omitted ability bias would generally lead to at times quite severely upward biased estimates of true returns.
The resulting calibration rule to get the LFS-style estimate of the average return to academic qualifications for males close to the true return suggests to multiply the "raw" estimate by 0.8. It has to be noted that these conclusions apply equally to education measured by the school as well as self-reported by the individuals themselves.
As to the relative importance of ability and measurement error biases, we find that while both sources of bias give rise to estimates that are significantly different from the true return, the bias arising from omitted ability controls is larger. In particular, we have shown above how estimates that correct for measurement error but not for omitted ability incur a 43% upward bias, whilst controlling for ability but ignoring misclassification error in concurrent reports leads to a 27% downward bias both in the case of self-reported measure and of school transcripts.
To conclude, in a situation where educational records were obtained relatively close to the completion of the qualification of interest, we find that the policymaker or analyst cannot simply rely on measurement error to cancel out the ability bias.

Estimating returns based on educational reports that rely on recalling the attainment of the qualification of interest over more than 10 years
We now turn to consider a situation in which the educational information recorded in the data has been collected after over 10 years since completion. Since in line with a priori expectations we have found the recall measure to suffer from a larger extent of measurement error, we now expect the relative importance of omitted variable bias and measurement error bias to shift.
Indeed, relying on the recall educational measure and controlling only for the LFS-style variables, the estimated raw return (∆ LF S ) is 29.3%, which is almost halved once we control for the full set of observables (∆ F U LL being equal to 15.1%). However, once we compare these estimates to the true return (∆ * F U LL ) of 26.4%, we find that the latter is very close and statistically indistinguishable (at the 90% level) from the raw estimate ∆ LF S . In this application, measurement error in recall information is thus strong enough to fully compensate for the upward bias induced by omitted ability controls.
Specifically, while estimates that correct for misclassification but not for selection incur a 43% upward bias (compare ∆ * F U LL to ∆ * LF S ), controlling for selection but ignoring misclassification gives rise to a bias of exactly the same size (43%) but of different sign. Hence in sharp contrast to a situation where information on education was obtained relatively close to attainment, when relying on recall information it seems indeed to be the case that the two biases cancel each other out. There thus seems to be no need for a calibration rule: LFS-style estimates of the average return to academic qualifications based on recall information on qualifications are indeed very close to the true return.

Conclusions
In this paper we have provided reliable estimates of the returns to educational qualifications in the UK that allow for the possibility of misreported attainment. We have additionally identified the extent of misreporting in different types of commonly used data sources on educational qualifications: exam transcript files and self-reported educational measures at different elapsed times after completion of the qualification of interest. We have thus provided estimates of the relative reliability of these different data sources, as well as of the temporal correlation in individual response patterns.
We have also provided evidence on the relative importance of ability and measurement error biases, and produced some simple calibration rules as to how to correct returns estimated on data that rely on self-reported measures of qualifications and contain limited or no information on individual ability and family background characteristics (such as the Labour Force Survey).
Results in this paper thus represent a new piece of evidence for the UK policy community, which will allow one to appreciate the relative reliability of different sources of educational information as well as check the robustness of current estimates of returns to the presence of misreported qualifications.
Knowing the extent of misreporting also has obvious implications for the interpretation of other studies that use educational attainment as an outcome variable or for descriptive purposes.      Note. The top panel of the table reports the ATT computed as described in Section 3 (∆ * F U LL ), which represents our most reliable estimate (posterior standard deviation is in parentheses). It is obtained using the full set of controls, and adjusting for misclassification. Also, reported is the OLS estimate of the same parameter from Table 1, using LFS controls (∆ LF S ) and the full set of controls available in the NCDS sample (∆ F U LL ). P-values are to test the equality of the two estimates (see Section 5 for a description of how the test was implemented).  35

A Appendix A -Proof of non-parametric identification
The aim of this Appendix is to show that the setup considered in Section 2 is sufficient to nonparametrically identify the mixture components f Y |D * [y|d * ] and the extent of misclassification in the data. The result in what follows generalizes Hu (2008) to allow for over-identification which, for the case at hand, arises because of the availability of repeated measurements coming from the same individuals; for simplicity, the conditioning on X = x will be left implicit throughout.
Let the following matrices constructed from raw data be defined: ] , ] .
Using Assumption 3 and assumption 4 there is: or, in matrix notation: Now, under Assumption 6 the matrix F D * |D T is nonsingular (i.e. full rank), so that from (5) there is: which if substituted into (4) yields: Identification of F Y |D * , F D * |D T and F D S |D * is achieved by considering a particular type of generalized inverse, called the right Moore-Penrose inverse, which here always exists and is unique provided that the matrix to be inverted is of full rank (see, for example, Seber, 2008). Define: The matrix A + is known as the right Moore Penrose inverse of the matrix A and has the property that AA + equals the identity matrix. It follows that: where The above argument may be generalized further to accommodate for D * , D S and D T to be categorical random variables taking an arbitrary number of values as long as the independence assumption between D T and D S is maintained. The proof would proceed along the same lines. In this more general setting, the main complication lies in the fact that F D * |D T is no longer a square matrix, and that the existence of its left generalized inverse, crucial to obtain equation (6) and defined by is not guaranteed by the full rank condition stated above. It must also be the case that the number of columns of the matrix to be inverted is larger than the number of its corresponding rows. In our setup, this would amount to assuming that the support of the instrument D T is larger than the support of the latent random variable D * , an assumption which is standard in the literature on instrumental variables. In what follows we describe the MCMC procedure used to estimate weights and components of the mixture model. Let e(x) = (1, e 1 (x), . . . , e K−1 (x)) ′ be the vector of propensity scores obtained from a multinomial regression of G, defined as in Section 3.2, on a set of conditioning variables X. In our empirical application there is K = 8.

Additional References
Assume that the indicator function: is distributed as a Bernoulli random variable. Note that, in the setup considered, D * is a latent quantity. We will assume throughout that the mixture components are normally distributed as explained in Section 3, namely: , for i = 0, 1.
Note that the propensity score is, by construction, only affecting the mean of the potential outcome distribution, while retaining the assumption of homoscedasticity across individuals. The functions p(d S , d T , e(x)) and µ j (e(x)) we select are as follows: where γ g and θ j are K-dimensional parameter vectors and Φ(·) is the standard normal cumulative density function. Note that the subscript g, defining the combination of d S and d T considered, is introduced in the definition of p g (e(x)) to simplify notation. This setup defines the following vector of parameters: ξ = { θ 0 , σ 2 0 , θ 1 , σ 2 1 , γ 1 , . . . , γ K } .

B.2 MCMC algorithm
The goal of the MCMC algorithm is to approximate the posterior distribution of ξ given the data.
For the case at hand, given a starting point ξ (0) we will generate a Markov chain whose invariant

B.3 Prior distributions
To ease computation and to obtain closed form solutions for the full conditional distributions, we considered the following priors for the parameters in ξ. and V = I, with I being the identity matrix. Such choice is made so that the resulting marginal prior distribution for the mean of the potential outcomes 21 , is centered around the mean of the observed outcome Y . The variance of such prior distribution is also chosen to be sufficiently large (basically spanning the observed range of Y ) so that we are not imposing any strong prior knowledge on the value of µ i (e(x)), i = 0, 1. The top left panel of Figure 2 reports the shape of such prior density.
• Variances of mixture components. For i = 0, 1 we set σ 2 i ∼ IG(α, β), that is an inverse gamma distribution with density: 22 21 This is defined as: ∫ µj(e(x))dx. 22 If the random variable Z has a gamma distribution with parameters α and β: then Z −1 has the inverse gamma distribution with parameters α and 1/β. The density is always finite, its integral is finite if α > 0, and is the conjugate prior distribution for the variance parameter of a normal distribution. To simulate from an inverse gamma, one has to draw samples from a gamma variate, namely X, and then compute 1/X.
The values of the shape α and scale β parameters were chosen to 2 and 1, respectively. The corresponding density function is reported in the top right panel of Figure 2.
• Index probability. We set γ g ∼ N K (ζ g , W ). We select multivariate normal priors following Albert and Chib (1993), so to ease sampling from the full conditional distributions (see below).
In the application we set W = 0.5I, where I is defined as above. Note that we adopt different priors distribution for each group defined by the combination of d S and d T , summarized by g = 1, . . . , K, so to include prior knowledge on the corresponding probabilities p g (e(x)). In

B.4 Full conditional distributions
The choice made on the prior distributions for the parameters involved implies that the full conditional distributions can be derived as follows.
so that one could easily draw values from such conditional distributions.
The conditional posterior distribution of γ g is then a multivariate normal: γ g |ξ, e(x) ∼ N K (ζ g ,W ), whereζ g = (W −1 + E ′ g E g ) −1 (W −1 ζ g + E g T g ) andW = (W −1 + E ′ g E g ) −1 , with E g and T g being the matrices corresponding to e(x) and T , respectively, including only rows for which there is G = g.

B.4.3 Conditional means of mixture components θ i
The conditional posterior for θ i is multivariate normal with mean vector: and variance:Ṽ where S j and y j are the matrix obtained from e(x) and y only including rows for which there is D * = j.

B.4.4 Conditional variances of mixture components σ 2 j
The conditional full posterior distribution for σ 2 i is inverse gamma with parameters: where n j is the number of observations for which there is D * = j.

B.4.5 Algorithm
The sampler alternates two main steps. First, it draws from the distribution of the latent indicators D * given the model parameters ξ; then, it draws from the model parameters ξ given the indicators D * .
Convergence to the posterior distribution is obtained after a burn-in period set by a certain number of iterations (10, 000 in our application). All draws after convergence refer to the posterior distribution and are, by construction, autocorrelated. In our application, the number of random draws was set to 2, 000.
The algorithm consists of the following steps (with t denoting the generic iteration): • Initialize the chain at , for j = 0, 1 and g = 1, . . . , K.
• End for In Figure 3 we report the mixture components as they result from the algorithm. Reported in