Application of the EU-SILC 2011 data module “intergenerational transmission of disadvantage” to robust analysis of inequality of opportunity

This data article describes the original data, the sample selection process and the variables used in Andreoli and Fusco (Andreoli and Fusco, 2019) to estimate gap curves for a sample of European countries. Raw data are from 2011 roaster of EU-SILC, cross-sectional sample of module “intergenerational transmission of disadvantage”. This article reports descriptive statistics of the using sample. It also discusses the algorithm adopted to estimate the main effects and details the content of additional Stata files stored on the online repository. These additional files contain raw estimates from bootstrapped samples, which form the basis for estimating gap curves and their variance-covariance matrices. The data article also reports representations of gap curves for all 16 selected countries.

sectional information on the socioeconomic background of origin of the individuals interviewed in EU-SILC, along with standard relevant measures of labour market outcomes. In particular, the 2011 module contains retrospective information about the parental background experienced by the respondents when aged between 12 and 16 (see Atkinson et al. [3] for pros and cons of retrospective data). This unique base provides (to a large extent) comparable data allowing similar definitions for variables measuring outcome and circumstances across countries and time.
Base on raw EU-SILC 2011 module data (cross-section) data, this article extrapolates information for a subset of Sample selection process is based on males, aged between 30 and 50 who worked full time as an employee for at least 7 months in the income reference period. In addition, individuals who declared that they were living in another private household, foster home, collective household or institution Specifications table   Subject  Economics  Specific subject area  Public economics, welfare economics, inequality analysis, distribution methods, inference  Type of data  Table  Figure Raw ( Value of the data EU-SILC data represent the baseline survey introduced by the European Commission and managed by Eurostat to monitor and compare standard of living across European countries. Data are highly harmonized across countries, and collected by central statistical institutes. This guarantees a high degree of comparability of countries in terms of the main variables we consider to define earnings opportunities and parental circumstances. Data are available free of charge in selected institutions in Europe (such as LISER). Users can apply for a visiting scheme which grants resources (material and knowledge-based) to the users of these data.
were excluded. Following Raitano and Vona [4], intergenerational module weights are applied. The running sample that is used to produce Table 1 and Fig. 1 in [1] is made of 41,533 male respondents for which we observe circumstances, earnings and demographics (age in years and a categorical variable for being married). Descriptive statistics of the distribution of those variables are reported in Table 1 below. The data files are collected in the example_econletters.dta file in Stata format (optimized for Stata 13) available on the online repository. Figs. 1e16 in this article (see also [1]) are obtained from circumstances and earnings variables created from the raw data.
Circumstances. The 2011 EU-SILC module contain retrospective information about parents' educational attainment, occupational status, labour market activity status, family composition as well as presence of financial difficulties during respondents' teenage years. We focus on the educational attainment of the father as the relevant circumstance. To construct circumstances, individuals are first partitioned in three types (or groups) according to their father's education. The high education type consists of individuals who lived in a household where the father attained the first (e.g. bachelor, master or equivalent) or second (e.g. PhD or equivalent) stage of tertiary education; the medium education type consists of individuals who lived in a household where the father attained upper secondary education and post-secondary, non-tertiary education. Finally, the low education type consists of individuals who lived in a household where the father at most completed lower secondary education. Table 2 summarizes the circumstances assignment rule adopted.
Earnings. Earnings correspond to annual gross employee cash or near cash income data. This income measures is defined as the monetary component of the compensation in cash payable by an employer to an employee, and it includes the value of any social contributions and income taxes payable by an employee or by the employer on behalf of the employee to social insurance schemes or tax authorities. This variable reflects the relation between the labour income and individual circumstances before state intervention. The observed earnings were converted in purchasing power standard (PPS) using the conversion rates provided on the CIRCABC user group. For references, see: https://circabc.europa.eu/w/ browse/3c60eeec-aca4-4db7-a035-0a6d892e6069.
Data reproduced in Table 1 and Fig. 1 in [1] are estimates of econometric models that are run on data from the selected running sample. Econometric models allow to filter out residual uncertainty and produce estimates of opportunity profiles at country level, and compare these estimates across countries.

Experimental design, materials, and methods
Andreoli and Fusco [1] use earnings as a metric for opportunities (see also Andreoli and Fusco [2]). Two caveats apply. First, this variable is defined at the level of the individual, implying that labour supply decisions are assumed to be made at individual level, thus neglecting household bargaining issues. Second, wages represent yearly evaluations of performances, since we focus on individuals who spent more than six months in the income reference period as full-time workers.
Opportunity profiles are estimated via Recentered Influence Function methods (Firpo, Fortin and Lemieux [5]) to recover effects of circumstances on earnings quantiles, while controlling for age and marital status. We estimate standard errors and variance-covariance matrices via bootstrapped resampling procedures on baseline data, where stratification by country, year and region of residence ("psu" variable in example_econletters.dta) is accounted for (see Goedem e [6]).
The estimation algorithm proceeds as follows: 1) draw a bootstrapped sample from the using sample; 2) estimate RIF regression parameters, income levels and pdf at given preselected deciles for each bootstrapped sample; 3) calculate gap curves for each country, differences in gap curves across countries for each pair of types and aggregated inequality of opportunity indices for each country and their variations across countries; 4) reiterate the bootstrap procedure 250 times; 5) compute averages and standard error of gap curves, differences in gap curves, IOp indices and store estimates; 6) produce graphs of gap curves and of their 95% confidence interval based on bootstrapped standard errors at specific earnings deciles identified in point 2); 7) estimate variance-covariance matrices from bootstrapped data and use them to test relevant hypothesis, then test these hypothesis and count cases (passed on pairwise comparisons of types) for which an hypothesis is accepted or rejected. 8) Report estimates in the form of tables.
The estimation procedure generates additional data, essentially estimates from the baseline specification of the econometric model, that are then elaborated to produce tables of results. Additional data are stored in the folder "youtput" of the data folder available in the repository. Notably, this folder contains the following datasets, all created from the resampling procedure: -bs_frale.dta: reports estimates of regression coefficients estimates for RIF regressions, by country (country), income decile (percentile) and bootstrapped replica (rep). -bs2_frale.dta: reports estimates of income deciles (pdf_pcty_X) and the corresponding type-specific pdf level (pdf_pcty_X) for each circumstance type X ¼ 1,2,3 by country (country), income decile (percentile) and bootstrapped replica (rep).  is rejected or not for each comparison (accept_X) and then reports number of cases where H IOp 0 is rejected or accepted. -GO_bs.dta, reports estimates of GO index by country and of differences in GO index across countries. SE (bootstrapped) reported for levels and differences in GO index. against unrestricted alternatives require to impose equality constraints on vectors of parameter estimates that are jointly normally distributed (by assumption). Tests putting failure of gap curves dominance at the null against strong dominance at the alternative (a test adopted in [1] to verify gap curve dominance in those cross-countries comparisons where H IOp 0 is rejected) can be estimated from t-tests for differences in gap curves at specific quantiles (see Andreoli [7,8] for a discussion and application of these tests). Fig. 1 in Andreoli and Fusco [1] is obtained by stacking graphs of gap curves of selected countries. All gap curves (and their 95% confidence intervals) estimated from the running sample are reported below. The figures are obtained from data in gapcountry.dta are collected in the folder youtputygraphs in the repository.