In- work Benefits in Belgium: Effects on Labour Supply and Welfare

Belgium has implemented, following the example of other countries, in- work benefit policies since the early 2000’s, with the objective of increasing employment rates and fighting poverty. Belgian in- work benefits differ from most other in- work benefits as eligibility requires low hourly earnings. We study the effects extensions of those benefits would have both on labour supply and welfare, using a random- utility - random- opportunity model estimated on cross- sectional SILC datasets. Results show that further increasing the benefits would slightly increase labour supply and welfare of low- to- middle income deciles, but at very high net cost per job created. We compare our results with existing research and explain some mechanisms that possibly led to an underestimation of negative intensive margin labour supply responses in previous simulations.


State of research 2.1. International examples of in-work benefits
The most famous and researched examples of in-work benefits are the US Earned Income Tax Credit (EITC) and the UK Working Tax Credit (WTC). Besides those two schemes, we also give some attention to the French PPE, which shares the feature of the Belgian WB of targeting benefits to individuals having low hourly earnings. Other IWB that received significant attention from academic researchers are the German Mini-jobs reform in 2003 and the Self-Sufficiency Project experiment in Canada in the 90's. A detailed overview of IWB policies in OECD countries and a summary of research findings are available in Immervoll and Pearson (2009).
EITC The EITC, enacted in 1975 and still lasting, is a refundable tax credit for low-to moderateincome working individuals and couples. In this scheme, the amount of the benefit depends on a recipient's earned income and number of children. Most studies indicate that the scheme has a positive effect on the labour force participation of single mothers (Crandall-Hollick, 2016). 5 Regarding the impact on married secondary earners' decisions to start working, research still is inconclusive. Looking at the intensive margin, most of the empirical evidence indicates the EITC has had a negligible to small effect on the number of hours unmarried people worked, while it has had a negative effect on the hours worked by secondary earners. Lastly, as the EITC was not primarily focused on childless adults, their opportunity sets are less affected and little information has been published regarding potential effects on their labour supply decisions.
WTC The ancestor of the current WTC was introduced in Britain in 1970, under the name Family Income Support. 6 The WTC is similar to the EITC but is more generous and is based on net (rather than gross) family income. It has been shown consistently (see Brewer et al. (2006) for a discussion) across a number of studies that the scheme increased participation of single parents significantly. Moreover, it has been estimated that WFTC reduced labour supply of married women, both at the intensive and extensive margin, but that this was more than compensated by labour supply increases among married men.
PPE The PPE was introduced in 2001, reformed a couple of times in the later years, and lasted until 2015, where it was merged with an other IWB, the ''RSA activité'', into the ''Prime d'activité''. It's value is lower than the EITC and WFTC schemes, culminating at a few hundreds of EUR per year. Eligibility is based on full-time equivalent earnings. The benefit also depends on the family situation, and the scaling with respect to working time is not linear, so that the scheme still creates a part-time premium. Almost all studies (see Sterdyniak (2007) and Arnaud et al. (2008) for a detailed overview) find positive but very small employment effects of less than 0.5 percentage points. Stancanelli (2008) even suggests employment losses among married women and no significant gains for non-married women.

In work benefits in Belgium 2.2.1. History of in-work benefits
The first in-work benefits were deployed at the turn of the century, with the goal of reducing structural unemployment. They were introduced on the one hand through a reduction of social security contributions in December 1999, 7 and on the other hand through a refundable earned income tax credit, gradually phased in and phased out and conditional on working at least 13 hours a week, in the summer of 2001. 8 Both reforms were intended for low-wage earners. In 2005, those two policies were abolished and replaced by the WB, an extended social security contribution reduction for low-5. However this wide consensus among economists has recently been affected by new study of Henrik Kleven showing that employment increases align closely with the confounding effects of welfare reforms and a booming macro-economy. He concludes that the case for sizable extensive margin effects of the EITC is fragile (Kleven, 2019). 6. The scheme changed a couple of times, and its name changed successively to Family Credit and Working Families Tax Credit, before its final name Working Tax Credit (combined with Child Tax Credit). Most of the evaluations of employment effects refer to the period where the IWB was named WFTC, and was significantly increased. 7. Loi du 20 Décembre 1999 visant à octroyer [un bonus à l'emploi sous la forme d'] une réduction des cotisations personnelles de sécurité sociale aux travailleurs salariés ayant un bas salaire [et à certains travailleurs qui ont été victimes d'une restructuration]. 8. Loi du 10 Août 2001 portant réforme de l'impôt des personnes physiques. wage earners. The main reason was that the WB, by depending on full-time equivalent labour income instead of actual labour income, avoided part-time traps. Also, the WB had an immediate effect on monthly net earnings, while the tax credit was only computed after the fiscal year: by moving towards an extended WB, the policy maker aimed to render the link between working and the benefit more obvious, to increase the perception of work incentives.
In 2007, using fiscal freedom gained in 2001 in the Lambermont agreements, the Flemish region introduced the ''jobkorting'' (JK). 9 The JK gives a tax credit to those people who earn more than 5500 EUR/year, that phases out between 21.000 EUR/year and 22.250 EUR/year. As it was the case for the first federal income tax credit, eligibility to the JK scheme depended on actual earnings. In 2011 however, for budgetary reasons and after a threat from the European Commission to start a procedure against the measure at the European Court for discrimination, 10 the measure was abolished.
In 2011, the WB was complemented with a ''fiscal work-bonus'' (FWB) that reduces the personal income tax for those eligible to the WB, to further reduce unemployment traps. The level of the FWB is a fixed rate of the WB. In 2015, the government increased the WB and FWB schemes further as part of a broader tax-shift aiming at reducing the tax burden on labour. 11 In September 2019, the Flemish government announced the introduction of a large in-work benefit scheme the "Jobbonus" (JB). According to the Flemish government, the policy will be one of the main measures to get 120,000 more people at work in Flanders. At the time of writing, the exact parameters are not known, but the government gives the number of 50 EUR extra net monthly income for low-wage earners, and a complete phase-out at a gross monthly wage of 2500 EUR. They estimated the budget for the measure to be around 350 million EUR yearly.

Research findings
As explained in the introduction, Belgian in-work benefits are quite innovative in the sense that they depend on the full-time equivalent gross labour income and not on the actual labour income, which is supposed to create less negative labour supply incentives at the intensive margin of the labour market. Orsini (2006a) and Dagsvik et al. (2011) discuss this particular feature more in detail and analyze for the first time rigorously the effects of the Belgian WB. They use a discrete-choice labour supply model and compare the policy with some alternative situations, including the tax credit on low earnings that was temporarily implemented in [2001][2002][2003][2004]. They conclude that both measures have a positive impact on labour supply but that the WB is more efficient as it avoids the ''part-time trap'' created by the tax credit system. Decoster and Vanleenhove (2012) analyze the Flemish Jobkorting (JK). The authors compare the JK with 2 alternative scenarios of JK and show that in all three types of tax credits, labour supply reactions are negative at the intensive margin, and positive at the extensive margin, with a slightly positive net effect. However, the compensatory effects (specifically tax and social security contributions increases due to increases in labour supply) are small and the costs of the measure therefore important. Vandelannoote and Verbist (2016) analyze various types of hypothetical in-work benefits, playing with income thresholds, tapering mechanisms, individual or household schemes, etc., to give an overview of how the design of the scheme can lead to different effects on poverty and employment. 12 They conclude that such benefits imply a trade-off between poverty reduction and labour market activation which has to be considered in light of the aim of the policy. Regarding labour market activation, individual schemes with an income threshold and a tapering-in mechanism seem to work the best, while household schemes perform the best when looking at poverty reduction.
Regarding the future Flemish Jobbonus, Decoster and Vanheukelom (2019) question to what extent those benefits will be effective in increasing labour supply, given already existing federal 9. Note that this measure is not the same as the "Federale Jobkorting", the name given to an increase in the amount of deductible professional expenses. 10. The measure was accused to discriminate people working in Flanders but living outside Flanders, as those would not have the right to the jobkorting. 11. Other measures part of this tax shift include: decreases of the personal income taxes through expansions of the brackets and the abolition of the 30-% bracket, decreases of the social security contributions for employers and self-employed, the introduction of an increased and uniform tax free amount, etc. 12. See also the reworked version Vandelannoote and Verbist (2019). in-work benefits, and underline the risk of lowwage traps due to high effective marginal tax rates for low-wage earners. Moreover, they show that the purchasing power gains are spread over the income distribution and not concentrated at low disposable income households. Finally, they estimate the cost of the measure at 174 million EUR, quite less than what the Flemish government announced.

The Work-Bonus: policy details
The WB is a Belgian in-work benefit that consists of a reduction of social security contributions for individuals with low earning capacity. The level of the WB depends on the full-time equivalent (hereafter FTE) gross salary, denoted Wft , of the employee: for FTE gross salaries lower than a certain threshold θ1 , the WB is equal to a fixed amount A weighted by the work regime, calculated as L LFT where L is the number of hours worked and LFT corresponds to a full-time regime. For FTE gross salaries higher than θ1 the fixed amount is tapered out at a rate of ρ1 , until it reaches 0 at θ2 . As explained in subsection 2.2.1., the WB is complemented with a fiscal work-bonus. The fiscal work-bonus is a tax reduction that is calculated as a percentage ρ2 of the WB. The WB and FWB granted can thus be written: The values for the above-mentioned parameters for white-collar workers for the year 2016 are given in Table 1. 13 Note that the WB cannot exceed social security contributions, and that the fiscal bonus cannot a exceed a maximum value (set at 640 EUR/year in 2016).

Model
To model workers' labour supply decisions, we estimate a random utilty-random opportunity (RURO) model, where labour supply is seen as the outcome of agents choosing from a set of job offers. Our model builds on Capéau et al. (2016). For a detailed derivation of the RURO model, see also Dagsvik and Strøm (1992).
We first discuss the database we use and filtering we do in subsection 4.1. Then we go through the building blocks of the model in subsections 4.2. and 4.3.: the utility functions and the opportunity functions. In section 4.4. we calculate the probability that an individual chooses a job with a given wage and amount of required hours. Having the actual worked hours in the database, we can compute the parameters of the utility and opportunity functions that maximize the likelihood that the sample was observed. This procedure is explained in section 4.4.. 13. The amount of the basic reduction is slightly different for blue and white collar workers. Since it is not possible to distinguish between these types based on SILC data, only the amount forwhite collar workers is given here and simulated in EUROMOD. An histogram of the hours worked, by gender, is displayed in Figure 2. The hours worked are those declared by the surveyed individuals, when asked how much hours per week they work on average (including overtime) in their main and complementary jobs. It is important to note that in our sample, overtime workers (that we define as workers working more than 40 hours per week, as will be explained in 4.3.) represent 19% of the population. This is substantially more than what is observed in administrative data, where overtime hours are due to people combining different jobs and represent less than 2% of situations. 15 While self-reported data is known to be vulnerable 15. Estimated based on figures 2 and 3 in Dagsvik et al. (2011). Note also that there are very specific situations where for technical and organizational reasons the law authorizes derogations to the 38 (to 40) hours week, but

Data
To estimate our model, we use the Belgian database of the European Union Statistics on Income and Living Conditions (EU-SILC). This database is representative for the Belgian population and contains information on income, socio-demographic situation and various other dimensions related to labour supply. We assume preferences of individuals did not change over the time of the different EU-SILC surveys used (2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017) in our estimation. 14 If preferences did indeed not change significantly over time, our method would increase the accuracy of the estimates on one hand because there are more observations, and on the other hand because people are observed in different tax-benefit contexts, helping for the identification of preferences. The tax-benefits system of the corresponding years are used to calculate the disposable incomes (section 4.3.), which are up-rated with consumer price indices to render them comparable. We consider two types of households: singles workers (hereafter: singles) and couples of workers (hereafter: couples). Singles are defined as households with one adult person that is available to the labour market (i.e. aged between 16 and 64 years and not being sick, in education, disabled or (pre-) retired). Couples are households with 2 adult people that are available to the labour market, and those 2 people form a couple. All other types of households are dropped as their labour supply decision processes might differ too much from the typical cases (e.g. households with one parent and one child available to the labour market are excluded). We also exclude people declaring outlying (>70 h/week) numbers of weekly worked hours, and outlying wages (>60 EUR/h), as well as self-employed for whom information on labour income and worked hours might be less reliable. The percentage of labour market available individuals included in our sample varies from 64.9 % to 69.4 % , depending on the survey year. Descriptive statistics of the selected households are given in Table 2. 14. We use the data as different cross-sections (we abstract from the fact that a share of individuals might appear in different surveys). to exaggeration or biases, 16 we believe that this discrepancy reveals that many workers work more than contractual (or legal) hours, what may lead to different estimations of hourly wages, that are computed as gross earnings divided by worked hours. Our assumption is that people working more than contractual hours are generally compensated by a higher wage (in other words, they accept a job that is expected to necessitate frequent overtime, if this job provides a consequential salary). This assumption seems supported by the fact that people working overtime declare higher monthly earnings. In this regard, using survey data might bring an insightful and more accurate estimation of hourly wages, especially when used as a proxy for earning capacity. Moreover, as tax-benefit reforms might alter the relative attractiveness of such over-time opportunities (see subsection 5.2.), including them more accurately in the opportunity sets of individuals will provide a more complete picture of likely behavioural changes. The profiles of those overtime workers are detailed in Table 3.

Utility
Let Uij(dj, lj, ϵij) denote the utility of an individual i choosing a job j, and denote the utility function of couple i choosing jobs j and k , where is the disposable income, l the weekly hours of leisure (equal to time endowment minus the number of working hours required for the job opportunity) and ϵ a taste shifter corresponding to the job choice(s). Superscripts m (male) and f (female) represent the respective variables of the members of a couple. The utility function is assumed to be decomposable into a deterministic function and the random taste shifter that represents utility derived from unobserved (by the researcher) attributes of a job. This taste shifter is assumed to be i.i.d. distributed across job combinations and households according to a Gumbel distribution with location parameter 0 and scale parameter 1 .
The deterministic part of the utility function is assumed to have the following Box-Cox structural specification 17 for singles (Equation 3), where the parameters are allowed to differ for single males and single females, and couples (Equation 4), for which a cross-leisure term is added: . ( We allow for heterogeneity in the marginal rates of substitution between leisure and income by introducing covariates related to age, number of children in different age-categories, region and education linearly into the leisure parameters ( αl , α m l , α f l ): those should mostly be compensated by periods of rest in a way that average weekly hours do not exceed the legal working week. 16. As the social desirability bias (a tendency of survey respondents to answer questions in a manner that will be viewed favorably by others) that could for example result from individuals desiring to be seen as hard-working. 17. We abandon i, j and k subscripts for clarity.

Opportunities
We assume hourly wages are drawn from a log-normal distribution g1(w) and are independent of hours worked.
where σ and the vector γ are parameters. Xw is a vector of covariates that might affect the median of the wage distribution: education, gender, (potential) experience and survey year.
Average weekly working hours of job opportunities are assumed to be drawn from a uniform-withpeaks distribution, where the peaks are chosen to correspond to typical part-time regimes as well as the full-time regime, and where parameters allow to calibrate their height. Peak height can differ only by gender, reflecting the fact that part-time jobs are mostly available in female-dominated sectors of activity (Meulders and O'Dorchai, 2009): health and social work, other community, social and personal service activities, private households with employed persons and also, although to a lesser extent, education. The domain H represents the possible values, and is assumed to range from 0 to 70 .
In addition, a number of ''out-of-market'' job opportunities can be available to the individuals. The intensity of job offers relative to out-of-market opportunities is allowed to vary across individuals, according to the gender and a set of covariates.
where Xo is the vector of covariates that includes region, education level and group-specific unemployment rate. The group-specific unemployment rate corresponds to the unemployment rate per gendereducation group, taken from Eurostat. Those rates vary across years and are assumed to be inversely related to the number of suitable jobs accessible to those groups of individuals at a particular point in time. The inclusion of this variable therefore could improve the identification of job-offer intensities.
For each job opportunity requiring an amount of hours h, the gross wage is computed as the multiplication of required hours h and the hourly wage w. EUROMOD then allows to calculate the disposable income corresponding to the job choice: it takes gross income values as an input and determines, 18 based on the socio-demographic characteristics of the households, the amount of taxes and benefits they are subject and eligible to, which are respectively subtracted and added to the gross income. This transformation is denoted di (l, w) .
We assume that people opting for an out-of-market opportunity apply for means-tested social assistance. In practice, people not working are eligible to unemployment benefits under the condition that they have a sufficient work-history and are involuntarily unemployed. When, on the contrary, an employee voluntarily quits his job, he can be subject to a penalty of some weeks (from 4 to 52) before being eligible to unemployment benefits. If the employee leaves his job with the intention of relying on unemployment benefits on the long run, he can even be sanctioned indefinitely. Those not eligible to unemployment benefits can apply for social assistance. In that case a thorough means-test, encompassing also means of relatives and assets owned, can be expected. Those being ineligible or not applying to those two schemes (due to stigmatization for example) probably rely on some types of private transfers, personal savings, informal work, mendicancy, etc., as basic needs have to be fulfilled. It is difficult to infer from the data in which of those situations workers opting for an out-of-market 18. EU-wide arithmetic tax-benefit model widely used by the European commission for policy analysis and developed and managed by the Institute for Social and Economic Research (ISER) at the University of Essex. opportunity would end up, due to limited information on their preferences and financial situation. For those that are observed not working the same assumption is made, even tough some report unemployment benefits. The reason is that it would be difficult to consider those benefits, that are observed at one point in time, as representative of the level of revenue those individuals (expected to) receive when not working, as such benefits decrease importantly over time.
Considering means-tested social assistance benefits as the out-of-labour income is therefore a simplification needed to compensate the lack of information and an approximation of a consumption floor for survival. An additional argument could be that, especially since the reform of 2012, 19 the declining unemployment benefits come near (and even drop below, for cohabitants) social assistance benefit levels in the long-term (see Galand and Termote, 2014), and that our assumption thus does not misrepresent too much the evaluation of out-of-market opportunities by forward-looking agents with reasonably low discount rates counting on unemployment benefits. (Vi(di(l, w), l)) . The likelihood that an individual i will choose a particular job offer requiring labour time h = T − l , and paying a wage w, can, given our assumptions on the random utility term, will be written (Dagsvik and Strøm, 1992):

Probability and MLL estimation
in case of an out-of-market opportunity. In practice, we do not observe the set of wage and time regimes offers. We therefore draw a set of offers for each individual, denoted Di , from a prior density function, denoted S , conditional on the observed choice being included. 20 The probability to observe a given choice, conditional on this choice being in the drawn set of offers, is therefore given respectively by: 19. Arrêté royal du 23 Juillet 2012 modifiant l'arrêté royal du 25 novembre 1991 portant la réglementation du chômage dans le cadre de la dégressivité renforcée des allocations de chômage, Moniteur belge, 30 Juillet 2012. (Royal decree of 23 July 2012 modifying Royal decree of 25 November 1991 relating to unemployment benefits in the context of the reinforcement of degressivity of unemployment benefits) 20. We use uniform distributions for the hours (from 0 to 70) and hourly wages (from 0 to 60). The prior probability to draw an out-of-market offer is set at 0.10.
for market opportunities, and for out-of-market opportunities.
As we know the probability an individual works a given amount of hours, given his characteristics and given parameters, we can compute the likelihood that our observed sample was indeed observed, by multiplying those probabilities over all observations. We then look for the parameters that maximize the log-likelihood by using the Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimization algorithm (Broyden, 1970;Fletcher, 1970;Goldfarb, 1970;Shanno, 1970), a quasi-Newton method for solving unconstrained nonlinear optimization problems. The estimated parameters are in Appendix A (Tables A1-A4). Table 4 summarizes the covariates used in the different building blocks of the model, which are similar to those used in Capéau et al. (2016), with the exception of the yeardummies in the wage-offers. For a discussion regarding the identification of RURO models, see for example Capéau et al. (2016) or Aaberge et al. (1999).

Preferences
Preferences are defined over the positive ''leisure time -disposable income'' domain, with heterogeneity allowed in the leisure parameter. We estimate that utility increases with both disposable income and leisure time for almost all the individuals, as summarized in Table 5. Higher education in contrast decreases preferences for leisure time. We illustrate this with indifference curves of individuals with different education levels on Figure 3a and Figure 3b. Moreover, the number of children younger than 7 y.o. increases the importance of leisure time, except for single men.

Opportunities
The estimated percentage of job offers in workers' opportunity sets is given in Figure 4. One can observe that higher education strongly increases the number of job opportunities. For women, higher unemployment rates in an age category slightly decreases the number of job offers for those in that category, in line with the intuition that higher unemployment rates increase competition between unemployed and diminishes the probability to find suitable jobs. For men however, this effect  is estimated to be smaller and opposite, resulting in young man having more job offers, despite higher unemployment rates. The estimated wage distributions and evolution are given in Figures 5 and 6. Education has the strongest positive impact on wage offers. Males also receive slightly higher wage offers than females. Years of (potential) experience, calculated as age minus expected age of labour market entry, 21 have a positive impact on wage offers, up to a certain limit (34 for men and 35 for women) at which the person is close to retirement and possibly becomes less attractive for employers, resulting in lower wage offers.
The estimated hours distributions are illustrated on figure Figure 7a and Figure 7b . We estimate that men receive more full-time job offers than any other regime. The same holds for women, who however receive a significant amount of part-time offers.
21. Estimated at 15 for low educated individuals (ISCED levels 0,1,2), 19 of middle educated individuals (ISCED levels 3,4,5), and 23 for high educated individuals (ISCED level 6).  Table 5. Secondly, Table 6 reports the estimated elasticities following an increase of 10% of the wage offers. Besides total elasticities, we also report intensive margin elasticities, that reflect the average changes in labour supply of people who are already working. The third and fourth lines give the percentage (with respect to the total number of people in the gender-marital status category) of people starting/ stopping to work. The estimated elasticities are in the range (and rather at the right of the distribution) of those estimated in the literature (see for example Keane, 2011). 22. The exception in our eyes that could be significant is the case of unemployed finding no suitable job offer. In that case the unemployed might be desiring to find a good job, which provides a feeling of accomplishment, even if it would not be associated with an income increase. Also, young workers could be desiring to work many hours, even without additional salary, to build human capital and increase future utility.  Single men increase their labour supply slightly more than single women after an increase in wage offers, both at the intensive as at the extensive margin. 23 Regarding married individuals, estimates indicate that married women would increase their labour supply more than men. 24 This can be partly explained by the fact married women more often are out of the labour market or work part-time, and thus have more room to increase their supply of labour.
A number of married individuals will also react to an increase in their partners' wage. This is due on one hand to an income effect, and on the other hand to the complementarity between partners' leisure time. The income effect will tend to reduce the partners' labour supply, while the effect of leisure complementarity will depend on the labour supply change of the first partner. Positive and negative labour supply reactions are predicted both at the intensive and extensive margin, but in both cases the negative effect dominates when looking at the aggregate level. Married individuals will thus on average decrease their labour supply or leave the labour market following an increase in their partners' wage offers.
Lastly, we compare the observed distribution and the predicted distribution of some variables, using random draws for the stochastic part of the utility in the latter case. Density plots of observed and simulated hours worked, wages and disposable incomes are given in Appendix B.1 (Figures A3-A6).

Reform scenarios
We simulate 3 hypothetical scenarios that increase the generosity of the work-bonus, named respectively reform 1 , 2 and 3 . 25 Those scenarios do not change the structure of the scheme, but only its generosity, in terms of base amount and/or in terms of eligibility thresholds, and are calibrated to have the same budgetary cost before behavioural changes, set at 300M EUR yearly. The first scenario consists of increasing the maximum amount of the reduction (parameter A ), thereby further decreasing the participation tax rate for low earning capacity workers. The second scenario consists of increasing both the maximum amount and the thresholds proportionally (parameters A , θ1 and θ2 ), which, besides further decreasing participation tax rates, also extends the scheme to higher earning capacity workers. Finally, in the third scenario the maximum amount is kept constant, but the eligibility range is increased (parameters θ1 and θ2 ). 26 The base case policy is represented on Figure 8 together with the 3 reform scenarios. The continuous line represents the level of the existing WB granted to a full-time worker, in function of his gross monthly wage. The dotted lines represent the 3 hypothetical reform scenarios. The domain starts at 23. Note that the number of men that stop working after a wage increase is slightly positive. This seemingly counter-intuitive result is due to the fact disposable income does not always increase monotonically with labour income due to discontinuities in certain means-tested benefits (for example due to a yearly 250EUR exemption in the means-test for social assistance). 24. We use the adjective married to refer to members of the couples. Those can also be in a consensual union. 25. Note that we use 2016 as reference scenario, even though the workbonus has (very) slightly and gradually been augmented during the 2015-2019 tax-shift. 26. Note that in the reference scenario, the WB cannot exceed social security contributions due, and the FWB cannot exceed 640 EUR/year. In our reforms we allow the WB and FWB to be refundable benefits in case they would outreach those limits. the level of the minimum wage, and one can observe that the phase-out area starts soon after, as explained in Section 3. It can also be seen that reforms 1 and 3 will change the withdrawal rate of the policy, respectively to 29.2 % and 16.9 % , while reform 2 keeps this rate unchanged at 21.9% . Lastly, the bars represent the distribution of monthly FTE gross wages across the population, and show that our reforms will change the budget set of a significant number of individuals.

Theoretical predictions
Two effects can arise when changing the shape of the budget constraint of an individual working for a given hourly wage. If in the reform scenario, for a given amount of hours, income increases, a person working that amount of hours will feel richer and might reduce his labour supply in order  to have more leisure time with an income higher or equal to the initial one. This effect is called the income effect, and is stronger when the marginal utility of income is relatively small, at the condition that leisure is a normal good. On the other hand, a reform of in-work benefits will also change the marginal return to work (corresponding to the slope of the budget constraint). An increase in the marginal return to work means that ''not working'' (or ''leisure'') implicitly costs more in terms of foregone wages. People facing a higher (lower) marginal return to work will therefore have a stronger incentive to work more (less). Those effects are called substitution effects. Note that besides those two effects, occurring on a budget constraint for a given wage, workers can also opt for jobs paying different wages. The change in budget constraint is illustrated for reform 1, for a single individual working at the minimum wage, on Figure 9. The dotted and continuous black lines represent the disposable incomes in the base and reform 1 scenarios, which is decomposed into different components in the reform case. We see that the reform renders many in-work options more attractive. The reason of the flat part on the left is that for small numbers of hours worked the benefit of the reform is offset by lower means-tested social assistance benefits. Then come part-time and full-time job situations where the reform has a strong and increasing (with labour supply) effect, which is straightforward given the design of the scheme. The benefit will start decreasing when working overtime. This result comes from the fact that the FTE wage, used for the determination of the WB level, is calculated by dividing gross earnings by contractual hours of work. People that have accepted a better paid job requiring frequent (informal) overtime, thus see their hourly wage over-estimated in that calculation. 27 The option to work less and by doing so become eligible to the WB therefore becomes relatively more attractive. This third effect is not detectable with administrative data, where informal overtime is not taken into account. Lastly, note that for higher levels of hourly wages, the reform would have less impact, as the WB phases gradually out with hourly wages. Our reforms might thus incentivize individuals to switch to lower-paid jobs, that become relatively more attractive.
In summary, we can say that the options that become relatively more attractive following a WB increase are those that pay low wages, and that require hours that are not too low (as the meanstested benefits will offset the WB), nor too high (as the hourly wage will be overestimated in case of frequent overtime work).

Simulation procedure
We use a procedure proposed by Duncan and Weeks (1998) to simulate the behavioural changes resulting from our tax-benefit reforms. The procedure is essentially a Monte Carlo method relying on repeated random sampling of the random utility term. The different steps of this procedure are detailed next.
The first step of the procedure consists of, for each household, drawing a set of k opportunities (market and out-of-market) from the estimated hours and wage distributions, conditionally on the observed opportunity being included. For each opportunity, we calculate the corresponding disposable income in the base and reform scenarios with the arithmetic micro-simulation model EUROMOD, of which the parameters are adapted in the case of the reform scenario. This allows to calculate the deterministic part of the utility function for each opportunity.
We then draw a vector of random utility terms from a Gumbel distribution in a way that guarantees that the total utility will be the highest for the observed choice, following a procedure proposed in Bourguignon et al. (2001). This operation is repeated m times to get m vectors of random Gumbel draws that predict choices that correspond to the observed choice.
Lastly, in each of the m cases we predict which would be the ''reform opportunity'' chosen by the household by adding those random utility terms to the deterministic utilities under the reform scenario. This gives rise to a probability distribution over the set of opportunities of each household under the new tax-benefit system. This probability distribution then allows to calculate the expected 27. As an illustration, imagine a low-level manager earning a gross wage of 2500 EUR per month for a job that requires on average 200 hours of work monthly (of which some informal overtime). This person's hourly wage equals 12,5 EUR/hour. However, as the legal or contractual working week counts 38 hours (about 165 per month), his earning capacity in the eyes of the fiscal authority will be estimated at more than 15 EUR/hour, excluding him from the WB scheme. hours of labour supply after the reform as well as expected income, tax and benefit amounts. Our simulations are run on the 2017 SILC dataset, which has income reference year 2016, and we set k = 50 and m = 10 .

Labour supply
All 3 scenario's predict an increase in labour supply if measured in terms of participation (number of people working positive hours). If measured in FTE, only scenario 1 and 2 predict an increase, while scenario 3 predicts a small decrease. An overview of the impact of the 3 different reforms is given in Table 7. 28 Those effects are further decomposed for the different gender-marital status groups in Figure 10.
One can clearly observe that in all scenarios women increase their labour supply the most. The same holds for singles versus married. The first observation can be explained by the fact women receive lower wage offers on average and therefore potentially benefit from the increased workbonus more often. In addition, they start from higher unemployment/inactivity rates, and decreased participation tax rates have thus a higher potential effect on labour market entry. In addition, men have more jobs requiring overtime, and as explained in 5.2., such jobs can in certain cases become relatively less attractive than other jobs after a workbonus increase, leading overtime workers to switch to jobs requiring less hours.
The smaller or negative labour supply reactions of couples are mainly explained by stronger income effects: low-wage workers experience a revenue increase, resulting in a reduction of labour supply of one (or both) of the household members. Note also that married men reduce worked hours on average, which is explained by a partial substitution of male labour by female labour supply. Overall, the WB leads to a more homogeneous distribution of work.
Finally, one can observe total gross labour income decreases. This is driven by the decrease in average wages, due to a number of people switching to jobs that offer lower wages but desirable attributes, as those became relatively more attractive after the reform.
28. Not all workers are included in our simulation sample (see section 4.1.). Our filtering procedure keeps 64.9% of individuals who are labour market available, for that dataset. If one would be interested in having the absolute changes in labour supply for the whole population, a possibility would be to divide the predictions of the model in Table 7 by 64.9%, which would be relatively accurate if people excluded from the sample have comparable reactions to the reforms on average.

Welfare analysis
Besides equivalized disposable income, 29 we present here, following Decoster and Haan (2014), two alternative welfare measures that account for leisure: the rent criterion and the full-time (FT) criterion. The rent is the income that gives the same utility to an individual as a particular point in the ''leisuredisposable income'' space, when his labour supply is 0 . In a standard representation of the labour supply decision with labour supply in abscissa and net income in ordinate, it is found at the intersection of the indifference curve passing through a given bundle and the vertical line at labour supply h = 0 . 29. We use the term equivalized income to denote the income divided by an equivalence scale, in order to take into account economies of scale in household expenses. The FT criterion is found at the intersection of the indifference curve with the vertical line at the level of full-time labour supply. The main reason to prefer those measures over traditional individual welfare metrics (for example equivalent or compensating variations, fixed preference orderings, …), is that they respect maximally the heterogeneous preferences of the agents, while bringing the normative choices, unavoidably present in all interpersonal comparisons, clearer to light. 30,31 This can be intuitively understood by the fact those metrics use information on individual preferences, in the form of indifference curves, to translate the chosen ''leisure-disposable income'' combination into an ''equivalent'' income (rent or FT in this case). They moreover shed light on the normative assumptions made in the interpersonal comparison. For example, the rent-criterion gives maximal protection for people who give importance to leisure, and renders them minimally responsible for their aversion to work. The FT-criterion on the contrary treats industrious individuals (i.e. those having less distaste for work) more favourably.
The welfare changes per equivalized disposable income decile are represented in Figure 11 for reform 1. The same graphs are provided for reforms 2 and 3 in Appendix B ( Figure A1 and 30. Normative analysis in a framework of heterogeneous preferences poses the difficult problems of comparability and aggregation of individual utilities (due to an incompatibility between two sets of axioms, one related to the respect of individual preferences and one related to interpersonal comparability, see Fleurbaey and Trannoy (2003)), that are often neglected (e.g. when simply aggregating compensating variations). One way to address these problems is to use a reference preference ordering (or a reference agent/household). The disadvantage of this choice is, however, that it would create an inconsistency between the positive part of our analysis, where household heterogeneity plays a role in the decision process, and the normative part of our analysis, where this heterogeneity is set aside. Fleurbaey (2006), andFleurbaey (2007) propose another way to escape this incompatibility problem, by restricting (but retaining maximally) preference heterogeneity to what they call Subset Dominance. It is argued that giving priority to respecting preferences is less arbitrary and allows to shed light on the ethical priors used (and thus the cut between personal and social responsibility) when selecting the subsets: The rent-criterion for example gives maximal protection for people who give importance to leisure, and renders them minimally responsible for their aversion to work. The FT-criterion on the contrary treats industrious individuals (i.e. those having less distaste for work) more favourably. 31. For a detailed overview of the normative foundations of the three most popular approaches to measure well-being in a multidimensional setting: the capability approach, the subjective well-being approach and the equivalent income approach, see Decancq et al. (2015). For a discussion on welfare analysis applied to Random Utility Models of Labour Supply, see Decoster and Haan (2010).  A2) . One can see that the reform has the strongest impact on lower middle class individuals. The poorest individuals, of whom a significant share is out of the labour market because of stronger preferences for leisure and/or fewer job opportunities, 32 do not see their welfare increase much. The reason is that the welfare gains resulting from disposable income increases following the acceptance of a job are almost offset by the welfare losses due to the decrease in leisure time.
The third and fourth decile individuals on the contrary have higher employment rates, and many will thus benefit from a higher disposable income, without sacrificing leisure time. In the highest deciles, the average disposable income decrease is explained by a number of individuals opting for lower-paid jobs that have more attractive attributes. Welfare gains are thus small when measured with the FT and Rent criteria: gains in welfare due to preferred job-attributes barely exceed the losses due to the disposable income decreases.
In Table 8, we compute the population-wide weighted Gini-coefficient and poverty rate (assuming people out of the model do not change their labour supply). The Gini is calculated on equivalized incomes (using the OECD-modified scale), and the poverty rate is defined as the percentage of people having an equivalized income lower than 60% of the median equivalized income. All the reforms reduce inequalities and poverty by a small amount, with reform 1 reducing inequalities and poverty the most. This limited impact is due to the fact the purchasing power gains resulting from the WB scheme are, even though concentrated at the left of the wage distribution, quite spread over the equivalized income distribution.

Budget
The total budget necessary for each reform was set at 300M EUR yearly before behavioural changes. 33 The decomposed effects on the government budget, both before and after behavioural changes (where we assume individuals not included in the labour supply model do not change labour supply), are given in Table 9. None of the reforms pays for itself. In other words, the reduction of government expenses that follow such reforms, resulting from a decrease in social assistance and unemployment benefits, do not compensate the decreases in revenues from taxes and social security contributions. Those decreases are due to a direct effect resulting from the WB extension that reduces taxes and social security contributions paid by employees. In addition, workers will choose lower-paid jobs on average, which will reduce social security contributions, and to a lesser extent taxes, paid by employees even more. Those effects are not compensated by the additional taxes and social security contributions paid by new entrants. 32. In the lowest decile, individuals have an education level below average. Our estimates suggests that this results in stronger preferences for leisure and less job opportunities. Moreover, in the lowest decile, couples have more children than the average, which also contributes to higher preferences for leisure. 33. This is done by running EUROMOD iteratively and adapting tax-benefit parameters at each loop in order to approach a pre-defined budget deficit (300M EUR). We used a tolerance level of 1% (3M EUR).

Discussion
All 3 reforms predict labour supply increases when measured in participation. Reform 3 however, predicts a labour supply decrease when measured in FTE. An interesting complementary statistic for the parsimonious policy makers that see in-work benefits mainly as a labour market activation policy is the average net cost to increase labour supply by a FTE or alternatively, by a participant. Note however, that the denominator of this indicator can be close to 0 in reforms that create positive and negative labour supply changes that on average almost compensate each other, as in our reform 3, and to a lesser extent, reform 2, resulting in high numbers. We obtain costs per additional FTE ranging from 368.5 thousand EUR in reform 1 to 1660.1 thousand EUR in reform 2 . In reform 3 the ratio is not calculated as number of FTE decreases. Other studies on relatively similar schemes find lower numbers for Belgium. For example Orsini (2006b) and Dagsvik et al. (2011), estimate costs per FTE to be located respectively between 136.0 and 204.1 and between 35.5 and 171.3 thousand EUR depending on the in-work benefit structure and structural model used. 34 Regarding neighbour countries, Orsini (2006b) estimates costs per FTE in the the order of 272.1 thousand EUR for the WFTC, 231.3 thousand EUR for the German Mini-Jobs tax reform and 163.2 thousand EUR for the French PPE.
Costs per new participant are estimated in our simulation to lie between 121.1 and 560.7 thousand EUR, corresponding respectively to reforms 1 and 3 (Table 10). Only Dagsvik et al. (2011) gives numbers for Belgium, located between 18.8 and 36.3 thousand EUR. For Other European countries, Bargain and Orsini (2006) Orsini (2006b) and Bargain and Orsini (2006)) is that our simulation allows people to not only change worked hours, but also opt for lower paid jobs (with preferable attributes). In section 5.4 we showed that this would make the reform more costly, as people opt for lower-paid jobs, which mechanically decreases tax and social security contribution revenues of the government. This consequently increases the overall cost/additional worker of the reform. Particular attention should therefore be paid to this effect, especially when analyzing reforms that significantly change the relative attractiveness of jobs offering low wages, as is the case of the Belgian WB.
Those differences might also be explained by a modelling decision regarding how overtime workers are taken into account. It is common when using survey data to exclude people declaring ''unrealistic'' amounts of hours and it is necessary to round the observed values to values in the choice set when using a discrete hours choice model (in Orsini (2006a) and Orsini (2006b) people working >80h/week are excluded from the sample and the hours of those working between 45 and 80 hours are rounded to 50 hours), while in administrative data informal overtime workers are not detected, (as in Dagsvik et al. (2011)). Those limits result in potentially underestimated intensive margin elasticity estimations, as one do only take into account overtime workers reducing their labour supply to a limited extent.
As suggested by Decoster and Vanheukelom (2019) about the proposed Flemish WB extension, higher costs per additional FTE or participant might also be explained by decreasing marginal utility of income, meaning that further increasing in-work benefits will probably not have effects of the same magnitude as the initial introduction of those benefits. Along the same line, it could be explained by the purchasing power increases (following real wage increases) over the last decades that decreased marginal utility of income at a given labour supply level.
Finally, those estimate differences are probably partly explained by differing initial situations and reforms simulated: an in-work benefit is more likely to have a positive impact on employment in situations where more people are jobless. On the contrary, in an economy with full employment, the benefit will be costly and mainly create negative income effects.

Conclusion
In this paper, we discussed the history and the particularity of in-work benefit schemes in Belgium and simulated the effects of 3 different extensions of the work-bonus scheme, varying in terms of eligibility ranges and benefit amounts. We showed that such reforms would slightly increase labour supply when measured in terms of participation. When measured in terms of full-time equivalent workers, labour supply decreases in one of the 3 scenario's and slightly increases in the 2 others. Those small net effects are the result of opposing effects, both at the intensive and extensive margin. We then calculated that the cost of such reforms is significant, and that savings in social assistance and unemployment benefits resulting from higher employment rates are far from compensating the important decreases in social security contributions and personal income taxes. Those decreases are exacerbated by the fact some workers will opt for lower-paid jobs, that become relatively more attractive with the workbonus extension. This results in costs per FTE and new participant that are considerable, starting at 368.5 thousand EUR per FTE and 121.1 thousand EUR per new participant. Compared to previous studies, those figures are rather high. We gave different arguments why those figures could however be more accurate than previous estimations in subsection 5.5. We moreover calculated that the welfare gains are the highest for the third and fourth welfare deciles, while the first decile is almost unaffected.
The natural question that arises is whether increasing the workbonus is the most effective way to reach poverty reduction and labour market participation goals. One can reasonably wonder if investing in other active labour market policies as training programmes, additional coaching, more targeted wage subsidies or public jobs, possibly combined with investments in poverty-alleviating programmes, could not be a more adequate government spending.

Simulated distributions
To test the fit of the model, we take a vector of random Gumbel draws and add them to the deterministic utilities of the different options in decision maker's choice sets. We then compare the simulated distribution and observed distribution of some key variables. Note that we estimate the model using 7 cross-section data sets of years ranging from 2005 to 2015, while we here simulate distributions for 2015.