Boys don’t cry (or do the dishes): Family size and the housework gender gap

Abstract We here use data from the British Cohort Study (BCS) to link family size to age-16 children’s contribution to household chores and the adult housework gender gap. Assuming that home production is an increasing function of family size and using an instrument to account for the endogeneity of fertility, we show that larger families have a different effect on boys and girls at age 16: girls in large families are significantly more likely to contribute to housework, with no effect for boys. We then show that childhood family size affects the housework gender gap between the cohort members and their partners at age 34. Women who grew up in larger families are more likely to carry out a greater share of household tasks in adulthood, as compared to women from smaller families. In addition, growing up in a large family makes cohort members more likely to sort into households with a wider housework gender gap as adults. We show that the persistent effect of family size is due to the adoption of behaviours in line with traditional gender roles: a lower likelihood of employment and shorter commutes for women, along with a higher employment probability for their partners.


Introduction
Recent decades have seen a shift in the distribution of housework within couples. The time women devote to household chores has fallen, while men's participation in housework has risen. However, the housework gender gap is yet to be closed in most countries: the cross-national trends in Altintas and Sullivan (2016) show that convergence has stalled since the 1980s, especially in those countries where the gap was initially smaller. The burden of housework and childcare continues to disproportionately weigh on women, with consequences in terms of labour-market outcomes and well-being. Using data from the Multinational Time Use survey, Sayer (2010) shows that women in the early 2000s carried out 1.5 to 2 times as much housework as men in developed countries. Extending the analysis to the more recent waves of the American Time Use Survey, Bianchi et al. (2012) confirm that a decade later American women were still responsible for about 1.6 times more housework than men. Along the same lines, McMunn et al. (forthcoming) show that, in 2010, women in 93% of British couples spent more time on housework than their partners.
Standard models of household decision-making suggest that differences in bargaining power, from labour-market earnings and non-market work productivity, help determine intra-household time allocation (Chiappori, 1992;Van Klaveren et al., 2008). Female labour-force participation and educational outcomes are at a historical high in most OECD economies (International Labour Office, 2018;World Economic Forum, 2018), with considerable consequences for women's bargaining power and the quality of their outside options (Antman, 2014;Bittman et al., 2003). A large body of empirical work shows that the time spent in home production falls as either absolute or relative earnings rise (Bittman et al., 2003;Gupta, 2007;Gupta and Ash, 2008;Bertrand et al., 2015) and that, conditional on being employed, educated women participate less in housework (Baxter et al., 2008). Despite this progress, why do women then still devote disproportionately more time to housework than men?
Many researchers have turned their attention towards explanations based on gender identity formation (Akerlof andKranton, 2000, 2010). Gender identity develops according to the behavioural prescriptions that are commonly accepted in a given society. In societies where the definition of masculinity is based on the principle that men should be the family breadwinners and should not engage in "feminine" forms of housework, individuals will find it costly to adopt behaviours that deviate from this prescription, as it would be at odds with their identity and would translate into a utility loss. This is consistent with empirical work showing that women who are more educated or earn more than their partners, and who so deviate from gender-role prescriptions, compensate via a more traditional division of housework (Bittman et al., 2003;Lyonette and Crompton, 2015;Bertrand et al., 2015).
Even though gender-identity theory suggests that the housework gender gap is a by-product of utility-maximising behaviour, the subjective well-being literature shows that the housework gender gap can be perceived as unfair and, as such, it produces lower levels of happiness. Using British, German and U.S. data, Lepinteur et al. (2016) and Flèche et al. (2018) demonstrate that women report lower subjective well-being when they work longer hours than their partners. This effect mostly reflects that men do not step up and contribute more to housework as their partner's working hours rise, and women perceive this as unfair. The authors also show that these women are more likely to divorce and leave the labour force (or to find jobs with lower weekly working hours).
However, comparatively little is known about the role of childhood characteristics in shaping adult differences in housework participation. Based on the intergenerational cultural socialisation framework of Bisin and Verdier (2001), some authors have shown that children's perception of gender roles are directly linked to their parents' attitudes, contributing to the persistence of unequal gender norms (Farré and Vella, 2013). Indirect evidence comes from the literature on the influence of parental labourforce status on the labour-market outcomes and attitudes of their children (Vella, 1994;Fernández et al., 2004).
There are other ways apart from direct exposure to parental gender norms in which children can be socialised into traditional gender roles within the family. One factor that may well play a role in this sense is the household's demographic structure. While we know that decisions involving marital status and fertility affect adults' labour-force participation (Angrist and Evans, 1998;Cruces and Galiani, 2007;Bloom et al., 2009;Baxter et al., 2008), the concomitant effects on intra-household time allocation may well involve not only parents but also children. We here consider the role of family size: assuming that the amount of housework rises with family size (Blundell et al., 2005;Cherchye et al., 2012), then the time that parents move out of the labour market may not suffice to satisfy the greater demand for home production, so that children may be asked to step in and contribute more to housework (Brody and Steelman, 1985). If the effect of family size on children's housework contribution depends on their gender, then a larger family size might then feed through to the adult housework gender gap, through factors such as educational achievement, future labour-market outcomes, fertility and gender attitudes.
To the best of our knowledge, the causal impact of family size on the allocation of childhood household tasks, and the persistence of this effect in adulthood, has not been explored. We address the endogeneity of family size via an instrumental-variables approach, as in Angrist and Evans (1998). In the latter the impact of fertility on women's labour supply in the US is considered using an instrument reflecitng parental preferences for child sex composition: parents who whose first two children are of the same sex are more likely to have a third. Similarly, we restrict the analysis to families with two or more children and exploit parents' preferences for variety in the sex mix of the offspring to predict the number of children in the household. The Appendix provides an extensive discussion of the validity of the instrument in our context, following Conley et al. (2012) and carrying out a number of additional tests.
In our sample of the 1970 British Cohort Study (BCS), a larger family size during childhood increases the share of housework performed by girls at age 16, but not that of boys. This conclusion is robust to different measures of housework. Girls consistently also spend less time on other activities, namely homework and leisure. The effect of family size is mostly found in low-income and conservative households.
We then show that family size at age 16 also affects the division of household tasks in adulthood: at age 34, BCS cohort members from large families are more likely to be in households in which women spend more time than men in household tasks, and where the housework gender gap is significantly larger. We again find that the effect of childhood family size is significantly higher for cohort members who grew up in low-income and conservative households. The results at age 42 are similar. We then argue that this persistence is in large part due to the adoption of behaviours conforming to traditional gender roles: women who grew up in large families are more likely to be unemployed and to have an employed husband. Conditional on being employed, they are also less likely to commute. We also find consistent evidence of more conservative opinions among individuals who grew up in large families.
Our paper contributes to the existing literature in a number of ways. To the best of our knowledge, 4 we are the first to use both cohort data and an instrumental-variable strategy to estimate the causal effect of family size on the contribution of children to household tasks. The richness of our data also allows us to explore the effect of family size on the time spent in other activities, such as leisure or homework. Second, we use the same instrumental variable strategy to estimate whether the effect of family size at age 16 is persistent and affects the housework gender gap of the cohort members once partnered at age 34. Last, we consider some of the channels through which childhood family size affects the adult division of housework, namely education, labour-market outcomes, fertility and gender norms.
The remainder of the paper is organized as follows. Section 2 reviews two strands of the literature: the first on the link between family size and children's contribution to housework, and the second on the influence of family size on a set of determinants of the adult housework gender gap. Section 3 then describes the data and identification strategy, and the empirical results at age 16 appear in Section 4.
The results at age 34 are then discussed in Section 5. Last, Section 6 concludes.
2 Literature review 2.1 The determinants of the division of housework among children Theoretical models of household time-allocation usually consider that only adults carry out household tasks, while children, if anything, create the need for more housework (Blundell et al., 2005;Cherchye et al., 2012). However, time-use surveys reveal that children actually spend a significant amount of time performing household tasks (Peters and Haldeman, 1987;Bianchi and Robinson, 1997). The contribution of children to housework can first be explained by parental time constraints: employed parents may not have sufficient time to handle the housework load and may ask their children to help them. We expect children to be imperfect substitutes for their parents, as they are likely to be less productive than adults and can only contribute to a limited set of tasks. It can also be argued that parents ask their children to help with household tasks as they wish to transmit a set of skills to them and foster their human capital (Blair, 1992a).
The empirical literature on children's contribution to household tasks is small and mostly non-5 causal. Using US data, Gager et al. (1999) show that girls aged between 3 and 11 spend more time on housework than boys do. Girls also carry out more household tasks when their mother is employed full-time (Peters and Haldeman, 1987;Blair, 1992a), while the evidence is inconclusive for boys (Blair, 1992b). Antill et al. (1996) find that parental involvement in household tasks positively predicts children's housework participation.
We here focus on the effect of family size on the allocation of housework among children: according to Brody and Steelman (1985), this is ambiguous. An additional household member increases the housework load and may lead to parents asking their children to participate to a greater extent.
At the same time, an additional child also increases the number of potentially helping hands in the household. The net effect of family size on the housework load per child will then be positive (negative) if the new household member's contribution is higher (lower) than the marginal increase in housework her presence entails.
Using US samples of children aged from 3 to 11 and 12 to 16 respectively, both Bianchi and Robinson (1997) and Gager et al. (1999) find a positive relationship between family size and children's time spent on housework. These papers are the most-closely related to our first empirical question here, but neither addresses endogenenity. Family size is considered as a simple control variable. However, fertility decisions are not random and depend on confounding factors that may also be directly related to the allocation of housework among children.

The effects of family size in childhood on intra-household time allocation in adulthood
The housework gender gap can be defined as the difference between women and men in the time spent on housework. A body of theoretical and empirical work has aimed to understand why women still devote more time to household tasks than men do. On the theoretical side, both unitary and collective models of household decision-making suggest that the partner with the lowest earnings should spend relatively more time on housework (Stratton, 2015). Bittman et al. (2003); Gupta (2007) and Gupta and Ash (2008) confirm this prediction empirically: women contribute less to housework as their earnings rise. As earnings are positively correlated with human capital, we expect the housework 6 gender gap to be smaller in households where the wife is highly-educated. Baxter et al. (2008) use Australian data to show that women with a Bachelor's degree spend less time on average on household tasks than do women without a Bachelor's degree, conditional on being employed.
While the education gap between men and women has almost closed (World Economic Forum, 2018), the housework gender gap remains. The stream of literature burgeoning from the seminal work on gender identity by Akerlof andKranton (2000, 2010) attributes part of this persistence to gender norms. In Bittman et al. (2003), couples that deviate from the norm that "a husband should make more money than his wife" compensate by a more traditional division of housework in the US and Australia. This finding is corroborated in Bertrand et al. (2015), who also show that, controlling for the absolute level of income, women with a higher probability of out-earning their husbands are less likely to participate in the labour force. An extensive Sociological literature has confirmed that, holding earnings constant, egalitarian attitudes about the gender division of labour are associated with a smaller housework gender gap (see Carlson and Lynch, 2013, for a detailed review).
Baxter (2005) and Baxter et al. (2008) emphasize the role of life-course transitions in the housework gender gap: while men's contribution to household tasks is relatively insensitive to marital status and the number of children, marriage and motherhood significantly increase that of women. Using respectively British and German data, Schober (2011) and Grunow et al. (2012) confirm the asymmetric effect of parenthood on parental contributions to housework. Here again, Schober (2011) shows that parents with more egalitarian gender attitudes share housework more equally.
The three groups of determinants of the housework gender gap described above (i.e. education and earnings, gender norms and demographics) have one thing in common: they are all likely to be influenced by childhood family structure and, as such, are good candidates for mediating an effect of childhood family size on the adulthood division of household tasks. The paragraphs below review some of the literature describing the relationship between childhood family size and adult outcomes. Björklund and Salvanes (2011) note that a number of contributions have found large and robust negative associations between family size and different measures of child quality, such as educational achievement and adult labour-market outcomes. This is in line with the theoretical literature on the trade-off between child quality and quantity (Becker, 1960;Becker and Lewis, 1973). However, the use of instrumental variables to address the endogeneity of fertility decisions produces more nuanced results. In Cáceres-Delpiano (2006);Angrist et al. (2010) andÅslund andGrönqvist (2010) there is no causal effect of family size on adult educational achievement or labour-market outcomes, while Conley and Glauber (2006) and Black et al. (2010) find negative and significant effects on private-school attendance and IQ respectively. Anderton et al. (1987), Booth andKee (2009), Kolk (2014), and Fasang and Raab (2014) find evidence supporting of the intergenerational transmission of fertility decisions. Instrumenting for family size in Norwegian data, Cools and Hart (2017) find a differential effect of childhood family size on adult fertility by gender: an additional sibling increases male fertility but reduces female fertility.
The authors argue that this difference comes from mothers reducing their labour supply relatively less when they have a daughter than when they have a son. Cools and Hart (2017) also provides descriptive evidence of a substitution effect, as girls are more likely than boys to help with housework as family size rises. Girls then become more aware of the associated strain of large families and limit their own number of children in adulthood.
One may also expect adulthood gender attitudes to be influenced by childhood family size. We know that family structure and parental background play a role in the intergenerational transmission of gender attitudes. Vella (1994) uncovers a relationship between young women's attitudes towards female employment and her parents' educational backgrounds and labor-market behaviour. Using differences in the male draft across US states as an exogenous source of variation in mothers' labourforce participation, Fernández et al. (2004) argue that men who grew up in families with working mothers develop less stereotypical gender attitudes and are less likely to be the household breadwinner.
The intergenerational transmission of gender attitudes can be tested directly by correlating parental gender attitudes with those of their children. Using the NLSY1979, Farré and Vella (2013) find that the mother's views of the role of women, both in the family and in the labour market, affect the views of her children. They also show that mothers with less-traditional views about the role of women are more likely to have working daughters and working daughters-in-law (consistent with Fernández et al., 2004). Using the British Cohort Study, Johnston et al. (2013)  to infer the gender norms of the parents from the share of housework carried out by the mother, and find that conservative parents have sons and sons-in-law who perform less housework in adulthood.

8
As it affects the allocation of household tasks among boys and girls, family size is then likely to affect children's gender norms.
3 Data and empirical strategy

The British Cohort Study (BCS)
Our empirical analysis is based on the British Cohort Study (BCS). The 1970 BCS follows the lives of more than 17,000 people born in England, Scotland and Wales in a single week of 1970. Over the course of the lives of cohort members, the 1970 BCS has collected information on, amongst others, physical, educational and social development, health, economic circumstances and gender attitudes.
Since the birth wave of the survey in 1970, there have been eight other waves ("sweeps") at ages 5, 10, 16, 26, 30, 34, 38 and 42. At each sweep, different sources and methods were used to gather information on the cohort members. In the birth survey, the main questionnaire was completed by the midwife present at birth and supplementary information was obtained from clinical records. As the cohort members aged, questionnaires were administered to parents, teachers and, eventually, cohort members themselves. Medical examinations were also carried out and cohort members participated in thorough assessments.
The first outcome variable of interest during childhood is the contribution of the cohort member to household tasks. This is derived from the question "What kind of things do you help with at home?", asked when the cohort member is about 16 years old. The question is followed by a set of twelve items, each depicting a particular area of contribution to housework. The items are listed in the questionnaire as follows: "Shopping", "Washing up", "Cleaning the house", "Making the bed", "Cooking", "Looking after elderly relatives", "Looking after pets", "Washing and/or ironing clothes", "Gardening", "Cleaning car if any", "Painting or decorating" and "Looking after younger children if any". The possible answers are "Regularly", "Sometimes" or "Rarely or never".
A second housework outcome variable refers to the cohort member and their partner at age 34.
Cohort members married to or cohabiting with a partner were asked to report who does most of the following household tasks: "Shopping", "Washing up", "Cleaning the house", "Cooking", "Paying the bills", "Looking after children when they are ill", "Washing and/or ironing clothes" and "Looking after the children in general". The possible answers were "I do most of it", "My partner does most of it", "We share more or less equally" or "Someone else does it".
We further combine information on household composition at birth and age 16 to create a measure of family size. We do so by combining the younger siblings of the cohort member when she was 16 years old to her older siblings as reported in the birth sweep. 1 We also know the gender and birth date of all of the cohort member's siblings, which we will use to create our instrumental variable.

The endogeneity of family size
Our first goal is to estimate the impact of family size on teenagers' contributions to household tasks. To do so, we first show the Kernel density of the contribution to household chores of BCS cohort members at age 16 (calculated as the share of household tasks the cohort member helps with "Regularly", as opposed to "Sometimes" or "Rarely or never"), by gender and family size ( Figure   1). Consistent with the extant literature, we find that, for any family size, girls contribute more to household tasks than boys do. The descriptive results in Figure 1 further suggest that while family size does not much affect boys' contribution to housework, girls in larger families spend more time on household tasks than girls in smaller families do.
The evidence in Figure 1 is suggestive of a role of family size for girls, but does not address endogeneity. The distribution of fertility across households cannot be assumed to be random, as it depends on a set of both observable and unobservable household characteristics that may well be correlated with household tasks both during childhood and adulthood.
For example, BCS family size at age 16 is larger when the mother is not employed and has conservative opinions about maternal employment. Being on average less educated and less likely to be employed, mothers in large families will mechanically have more time to spend on housework, which in turn has a crowding-out effect on childrens' own contribution. Naïve specifications that do not account 1 As our identification strategy relies on the gender composition of the two first-born children in the household, we measure family size as the total number of siblings. We are aware that this measure may include siblings who had already left the household when the cohort member is age 16. We address this potential concern by using the number of children living in the household at the fourth survey sweep as an alternative measure of family size. The use of this alternative measure produces even stronger results (available upon request), although, as expected, the instrument appears to be slightly weaker.
for negative selection into parenthood and the time-use of mothers may thus underestimate the true effect of family size on childrens' contribution to housework.
The endogeneity of family size is commonly addressed via instrumental variables (IV). A first popular strategy is to instrument the size of families with at least two children by the sex composition of the two first-born children. To the best of our knowledge, Angrist and Evans (1998) is the first influential work to use this strategy, and estimates the causal impact of family size on women's labour supply.
The rationale here is that parents have a preference for variety: a couple with the first two children of the same sex is more likely to try and have a third, relative to a couple whose first two children are a boy and a girl. As the sex mix of children can be seen as random, the instrument provides the exogenous variation necessary for plausible identification. This approach has been widely-used in the literature to assess the impact of family size on a variety of child outcomes, such as education, fertility and labour market outcomes (Angrist et al., 2010;Black et al., 2010;Cools and Hart, 2017).
Multiple births can also be seen as a source of exogenous change in family size. A number of articles have used twin births as an instrument to estimate the causal impact of family size on outcomes such as women's labour supply (Rosenzweig and Wolpin, 2000;Angrist et al., 2010) and children's education (Black et al., 2005;Cáceres-Delpiano, 2006;Åslund and Grönqvist, 2010). However, using individual data on 17 million births over 72 countries, Bhalotra and Clarke (forthcoming) underline that twin births are systematically positively correlated with maternal health. This finding is robust to a battery of tests and casts doubt on the validity of multiple births as an instrument for family size.

Empirical strategy
We account for the endogeneity of family size by following the instrumental-variable approach in Angrist and Evans (1998). Our instrument is a dummy for the first two children of a couple being of the same sex. We do not make use of multiple births in our main analysis for a number of reasons.
First, as noted above, Bhalotra and Clarke (forthcoming) suggests that multiple births may not be random and can reflect positive selection into motherhood that could bias our estimates. Second, our estimation sample is of limited size and the lack of statistical power may be prejudicial to our analysis; a similar concern is raised by Black et al. (2005) when considering the results from estimation samples 11 of sizes comparable to ours. 2 We first estimate the following model by Two-Stages Least Squares (2SLS): where F amSize 16 i is family size at age 16, calculated as the total number of siblings of the cohort member plus one, and HhT asks 16 i is the contribution to household tasks of individual i at age 16, calculated as the share of household tasks the cohort member helps with "Regularly" (as opposed to "Sometimes" or "Rarely or never"). We use SameSex i , a dummy for the first-and second-born children being of the same sex, as an instrument for F amSize 16 i . X i is a vector of standard controls, including dummies for sex and the child being of European descent. We measure parental education by the age at which the parents left school, and include a dummy for the child's parents still living together in 1986. We control for household income when the study child is 10 years old (as it is not measured when the study child is 16 years old). We also include an index constructed by the data providers to measure the mother's attitudes towards maternal employment, when the study child was 5 years old. We control for a potential independent effect of the gender mix of all the siblings by adding a dummy named Balanced for there being at least two siblings of different sex in the family. As the instrument predicts the probability of being third-born, we include a set of birth order dummies to capture this relationship. Additionally, we take into account potential differences in children's housework contribution that might derive from their human capital by controlling for measures of cognitive and non-cognitive skills (respectively, a dummy equal one if the respondent has an O-level and the malaise score). Finally we worry that family size can affect children's contribution to housework via mother's labour force participation: women with large families are less likely to be in the labour force and to spend more time at home, thus having a crowding-out effect on the amount of housework children contribute to. We take this into account by controlling for a dummy for the mother having a regular paying job when the study child is 16 years old.
Similarly to Cools and Hart (2017), we might think that SameSex i will not satisfy the exclusion restriction and have an impact on cohort members' allocation of housework that prescinds from family size, despite the rich set of controls included in the analysis. As the contribution to household tasks differs by gender, it might be the case that having a sister rather than a brother will have a negative impact on the contribution to housework of the cohort member via a crowding out effect. The question here is the extent to which plausible failures of the exclusion restriction might bias our 2SLS estimates.
Appendix B provides an extensive discussion of the validity of our instrument, additional robustness checks and an analysis of "plausible exogeneity"à la Conley et al. (2012). We argue there that potential violations of the exclusion restriction are unlikely to drive our results -if anything, they could attenuate them.
Our first estimation sample covers individuals from families with at least two children and with valid information on both the household tasks performed at age 16 and the controls. This produces 3,389 observations. One reported task out of four is performed "regularly" and the average family size in our estimation sample is 2.8. The full descriptive statistics on this estimation sample can be found in Table A1 in Appendix A. Only 6,349 out of roughly 13,000 solicited families completed and returned the questionnaire measuring children's contribution to housework (the "Document G: Home and All That"). We ask in Table A2 whether children from our estimation sample differ significantly from those with similar characteristics but who did not complete "Document G: Home and All That" (i.e. children with at least one sibling and with valid information on the controls, but no information on the household tasks performed at age 16). Our estimation-sample children have on average a better family background (richer households, more educated parents and a more stable parental relationship) and are mostly girls. This is not surprising as male survey respondents, as well as respondents whose parents have low levels of education, typically have higher attrition rates and non-response rates with respect to females (Mostafa and Wiggins, 2015). We find a similar pattern of selection when comparing our estimation sample to the overall BCS population with non-missing information on the controls.
We then ask whether family size at age 16 continues to influence the time devoted to housework at 13 age 34, via a second 2SLS model: Here Y 34 i corresponds to one of the three following dependent variables measured at age 34: the share of household tasks carried out by the wife (or female partner), the share of household tasks carried out by the husband (or male partner) and the housework gender gap (the difference between these two shares). The vector X i includes the same control variables as in model 1. We do not control for the socio-demographic characteristics of the cohort members at age 34 (e.g. labour-force status and number of children) as we suspect that these may mediate the effect of family size in childhood and, as such, are "bad controls" (Angrist and Pischke, 2008). We will consider this potential mediation in Section 5.2. We have arguably fewer reasons to think that the exclusion restriction is violated in this second specification. It is indeed possible that having a sister rather than a brother would make our cohort members more likely to contribute to household tasks in adulthood -if women contribute more to housework in their childhood family then they may have accumulated a set of skills that can be transmitted to their siblings in the form of knowledge spillovers. However we believe that such mechanisms are of second-order and are hence unlikely to affect our estimate in a sizeable way.
Our second estimation sample covers individuals who are in a partnership at age 34, with at least one sibling at age 16, and with valid information on the household tasks performed at age 34 and on the controls. This produces a sample of 3,200 observations. 3 The cohort members in our estimation sample sort on average into couples where about half of housework is only carried out by women. Only 15 percent of the tasks are only carried out by men. Additional descriptive statistics for this sample are shown in Table A3.
4 Family size and the contribution to household tasks at age 16 4.1 Main results Table 1 shows both the OLS and 2SLS estimates of model 1. The first two columns refer to the whole sample of households with at least two children, while the sub-samples by child sex appear in columns (3) through (6).
The main variable of interest is family size. In the OLS estimates in column (1), an additional household member has a positive and significant impact on the contribution of the study child to household tasks. When we instrument Family size by Same sex, the 2SLS results in column (2) also reveal a positive and significant coefficient on Family size. 4 This result is in line with Bianchi and Robinson (1997) and Gager et al. (1999): larger family size increases the contribution of children to household tasks.
Looking at the estimates in column (2), one additional sibling increases the share of tasks performed "regularly" by 5.6 percentage points. This effect is equal to 25 percent of a standard deviation of the dependent variable. One can also notice that the effect of an increase in family size lies between the effect of having a working mother and being a woman. 5 While both the OLS and IV estimates are positive and significantly different from zero at 5% level, the OLS estimate is smaller than that from IV. This is in line with our hypothesis that the negative selection into parenthood reduces the true effect of family size. Figure 1 suggested that the effect of family size was mainly found for girls. We formally check whether there is a difference between boys and girls in columns (3) to (6). Both OLS and 2SLS estimates confirm that an increase in family size translates into a significantly higher contribution of girls to household tasks at age 16, while it does not affect the contribution of boys. The positive family size coefficient in columns (1) and (2) is thus mostly driven by girls. 6 Note that when we do not control for variables that are arguably endogenous (e.g. children's cognitive skills and mothers' labour force participation), estimates for family size are not statistically different from results in Table 1.
We may suspect a smaller effect of family size in families that outsource their housework in the market. The outsourcing of housework is not accurately measured in the BCS, so we use household income as a proxy for the probability of hiring help. We thus expect the effect of family size to be smaller in richer families, as they are more likely to hire in help for home production. We also expect the effect of family size to be stronger in families with traditional opinions regarding gender attitudes. Parents here might believe that their daughters (but not their sons) will face a marriagemarket premium when endowed with a set of domestic skills (this is consistent with the matching model developed by Chiappori et al., 2009, under the assumption of traditional household roles). Mothers of BCS members were asked their opinions on a variety of subjects in 1980 and the data providers used a principal component analysis to calculate an index of attitudes towards maternal employment.
The attitudes and opinions of the fathers were not collected but, under the assumption of assortative mating, we can assume them to be correlated with those of the mother.
The results appear in Table 2, where we split the estimation sample by gender, family income (above or below the third quartile) and mother's adherence to conservative gender norms (above or below the third quartile of the distribution of the index of attitudes towards maternal employment). 7 Family size significantly increases the contribution of girls to housework in low-income families but not in richer families. The effect of family size on boys' contribution to household tasks remains insignificant in both cases. We also see that girls with conservative mothers are more likely to contribute to household tasks as family size rises; there is no significant effect for girls with non-conservative mothers or for boys. 8

Sensitivity to the definition of the share of housework
We measure the contribution to housework at age 16 using the share of household tasks the cohort member helps with "Regularly". Cohort members report their contribution to twelve different household tasks. Due to missing information and survey filters, the average number of reported tasks in our estimation sample is 9.3 and the median is 10. It can be argued that the number of reported tasks partially drives our estimates. We address this concern in two different ways: we first add the number of reported tasks as an additional control variable, and then re-estimate our main regressions using only cohort members reporting at least ten tasks out of twelve. The results, compared to those from our baseline estimation, appear in the first three rows of Table A4. The first row shows our baseline estimates of family size for the whole sample and then for girls and boys separately. In the second row, controlling for the number of reported tasks makes no difference. We then show the estimated 2SLS coefficients of individuals reporting at least ten tasks in the third row of Table A4. Here, the effect of family size for girls remains unchanged, that for the whole sample is somewhat smaller, and that for boys is negative but not statistically different from zero.
Rather than using tasks that are performed "Regularly" as opposed to "Sometimes" or "Rarely or never", we can also look at the intermediate category "Sometimes". To do so, we assign a score of 1 to tasks performed "Regularly", a score of 0.5 to those performed "Sometimes" and a score of 0 otherwise. As for our original dependent variable, we calculate the share as the average score across the reported tasks. Using this new dependent variable does not affect our conclusions: as revealed by the last line of the Part A of Table A4, an additional family member still increases the whole sample contribution to household tasks and, once again, the result is mostly driven by girls.

Different definitions of tasks
Our main measure of household contribution uses a set of twelve tasks that have different features. As revealed in Table A5, most tasks are gender-specific. We consider a task to be "feminine" ("masculine") if the share of girls (boys) reporting doing the task "Regularly" is statistically larger than the share of boys (girls) at the 5% level. Girls spend significantly more time shopping, washing up, cleaning, making the bed, cooking, looking after pets, washing and ironing, and looking after younger siblings, while boys spend more time gardening, cleaning the car, and in DIY activities. The share of girls looking after older people "regularly" is slightly larger than the share of boys, but the difference is not statistically significant at the 5% level. In the first two rows of Part B of Table A4 we check whether the effect of family size affects the contribution of cohort members to "feminine" and "masculine" tasks differently. We find that, as family size rises, girls perform a significantly larger share of both "feminine" and "masculine" tasks (although their contribution to the former is larger than to the latter), while boys do not spend more time in any type of tasks. Note that this partition of housework into "feminine" and "masculine" almost perfectly overlaps with the intrinsic periodicity of the tasks (e.g. cooking and making the bed are daily activities, while a car needs to be cleaned less frequently), so that our results can also be interpreted in terms of frequency.
In addition, some of the tasks require the presence of particular items or person in the household.
This is the case, for instance, for tasks involving care-giving or those such as cleaning the car and tending to the garden. We cannot of course assume that all households in our sample satisfy the pre-conditions for these kind of tasks to be performed. We then exclude these in row three of Part B of Table A4, where we construct the share of household tasks carried out by cohort members based only on "unconditional" tasks, i.e. tasks that can be carried out in any household. We find that the effect of family size is even stronger when using this measure of housework contribution.
By pooling together the twelve types of household tasks to create one single measure, we also implicitly assume that all tasks increase equally in family size. This is not unrealistic for some of our tasks, such as shopping, washing up, cleaning, making the bed, cooking, washing and ironing, and looking after youngsters (Bawa and Ghosh, 1999). It is however more difficult to believe that looking after the elderly and pets, gardening, cleaning the car, and painting or decorating are tasks that are more likely to be regularly performed in families with more children. We then expect our main estimates to be driven by the first set of tasks, while the second set can be seen more as a placebo test. This is confirmed in the fourth and fifth rows in Part B of Table A4. A girl who grew up in a large family contributes significantly more to those tasks for which demand likely rises in family size.
However, there is no significant effect for girls when considering their contribution to the second set of tasks that we expect to be less sensitive to family size. We continue to find no effect for boys regarding either kind of tasks.
The last row of Table A4 excludes care-giving activities and only considers the contribution to household tasks that do not involve social interactions. Again, we find results that are in line with our baseline estimates: a larger family increases girls' contribution to housework but not that of boys.

Other measures of time: homework and leisure
Time is a finite resource. As family size rises and girls contribute more to household tasks, we expect a reduction in the time they spend on homework and leisure. The overall time allocation of boys should instead remain unchanged. We check this in the BCS by looking at time spent on a variety of activities.
We measure time spent on homework from the following question: "How much time did you spend doing homework yesterday?". The respondents were asked to use different time categories. Since more than two-thirds of our estimation sample reported doing no homework, we create a dummy for the cohort member having done at least some homework. Cohort members were also asked to report whether they read at least one book during the four weeks before the interview and if they were members of a sports club, a religious organisation, or any other youth organisation over the last 12 months. We construct an index of leisure activities as the share of activities a cohort member engaged in.
We re-estimate our main model using our measures of homework and leisure as the dependent variables. Table A6 shows the results for the whole sample and by gender. Consistent with our main results, girls spend relatively less time doing homework and are less likely to engage into leisure activities in larger families. As expected, there is no effect for boys. 9 As in Table 2, the effect of family size is stronger for girls who grew up in low-income families and with conservative mothers (these results are available upon request). 9 The large difference in sample size between the first three columns of Table A6 and Table 1 reflects that "time spent in homework yesterday" and "contribution to housework" were measured using different questionnaires. According to the data provider's documentation, the response rate of the former was much lower than that of the latter. Our measure of "participation in activities" and "contribution to housework" were measured using the same questionnaire, and the difference of approximately 100 observations here is due to missing information.
5 Family size and contribution to household tasks at age 34

Main results
We now ask whether the effect of family size at age 16 persists into adulthood and affects the division of the housework in households formed by BCS respondents and their partners at age 34. To do so, we replicate our IV analysis using as the dependent variables the share of household tasks performed by the female partner, by the male partner, and the housework gender gap (i.e. the difference between the two shares).
The estimates in the first three columns of Table 3 confirm a persistent effect of family size at age 16 on the division of household tasks at age 34. Larger families at age 16 predict a greater share of household tasks done by women, while the male share remains unchanged. As expected, column (3) then shows that the larger the family at age 16, the greater the housework gender gap at age 34. As such, cohort members who grew up in larger families sort into couples that conform more to stereotypical gender roles and in which the housework gender gap is even larger.
Columns (4) to (9) then ask whether this result is stronger for female or male cohort members.
It appears that only women sort into households with a significantly larger housework gender gap as family size at age 16 rises. As revealed in columns (4) and (5), this is mostly explained by a significantly higher share of household tasks predominantly carried out by the wife. 10 In the last three columns of Table 3, there is some evidence that male cohort members who grew up in larger families have a larger housework gender gap in their adult household, although the estimated coefficient here is not statistically significant.
The previous section established that there is no significant effect of family size on the contribution to household tasks for respondents who grew-up in high-income families or with relatively less conservative mothers. We again look at this kind of heterogeneity, with the results for females appearing in the first row of Table 4. The pattern here is similar: larger families have no impact on the contribution to household tasks and the housework gender gap at age 34 for women from relatively well-off families and non-conservative families. Consistent with the childhood results, women raised in large low-income families contribute significantly (at the 5% level) more to household tasks and sort into couples with a higher housework gender gap. We find no significant results for men (see Table A8 in the Appendix).
We can replicate this analysis for cohort members at age 42, to ensure that our estimates are not a statistical artifact driven by the choice of a particular survey year: the results in Table A9 are qualitatively similar.

Channels
Why does family size at age 16 continue to explain the individual's contribution to housework 18 years or more later? In Table 4 we explore the role of family size on adult characteristics which are likely to help shaping the housework gender gap of female cohort members (we replicate the exercise in Table A8 for male cohort members).
Section 2.2 suggested that we might expect children who grew up in larger families to have lower education and thus worse labour-market outcomes. We investigate this in Table 4  We then look at the effect of age-16 family size on a set of labour-market outcomes, namely employment, the monthly wage (in logs), weekly working hours and commuting time. Commuting in BCS is measured in time bands, and we here create a dummy for commuting time of over 30 minutes.
Only women from low-income families have a significantly lower probability of being employed and, when employed, spend less time commuting. This is in line with the burgeoning literature on gendered preferences over workplace amenities (Mas and Pallais, 2017) and local labour markets (Manning and Petrongolo, 2017). Our results provide indirect evidence that the definition of local labour market might differ by gender, due to the different costs associated with distance from the workplace: women might face social constraints that confine them to 'even more local' labour markets.
Only limited information is available on the partners of BCS respondents. However, we can estimate the causal effect of family size on the probability of having an employed partner. We find here positive and significant estimates in almost all our samples: women who grew up in large families tend to sort into couples where their partner is more likely to be employed.
We now turn to life-course transitions. According to Baxter et al. (2008) the housework gender gap does not change with marriage but does increases as individuals enter parenthood, and it has been shown that fertility is transmitted across generations (Anderton et al., 1987;Booth and Kee, 2009;Kolk, 2014;Fasang and Raab, 2014). The effect of family size at age 16 on the housework gender gap might therefore transit via the cohort members' own number of children. Table 4 asks whether family size affects the probability of being married at age 34, as well as the probability of being a parent and the number of children. There is some evidence that family size increases the probability of marriage.
Our fertility results are somewhat in line with Cools and Hart (2017): only the fertility decisions of men are positively influenced by their own family size in childhood, but not at conventional significance levels (see Table A8). On the contrary, the fertility decisions of women are not affected by their number of siblings and hence do not lie behind the effect of family size at age 16 on the housework gender gap at age 34.
We may then conclude that the disproportionate rise in housework contribution between girls and boys resulting from larger families enforces the adherence to conservative gender norms and so explains the results in Table 4. Individuals who grew up in families in which their housework participation depended on their gender might internalise the norm that women should perform the lion's share of housework. As such, they may not support women spending most of their time in the labour market. While gender attitudes are not measured in the BCS at age 34, the questionnaire at age 30 includes some opinion questions. To measure the degree to which a respondent holds conservative gender attitudes we use the extent of agreement to the following question: "Family life suffers if the mother works full-time". The possible responses here are: "Strongly Agree", "Agree", "Neither Agree of Disagree", "Disagree" and "Strongly Disagree". In particular, we use a dummy for the respondent saying they "Strongly Agree" or "Agree" with this statement to see whether a stronger adherence to conservative gender norms is causally affected by family size at age 16. The results appear in the last rows of Table 4 and can be interpreted as a convergent validity test of a shift towards more 22 traditional behaviour and attitudes. A larger family increases the probability that female respondents have conservative gender norms, although the estimate is only significant for women who grew up in low-income families. Women who grew up in large families are also more likely to disagree with the following statements "Kids benefit if mum has a job outside home" and "A mother and her family are happier if she goes out to work " (results are available upon request). 11 The results above suggest that women who grew up in large families, and particularly low-income families, sort into partnerships with more traditional gender roles, i.e. with a lower probability of employment and shorter commuting time for the woman, a higher probability of employment for the husband, and a greater probability of being married rather than cohabiting. But does the adoption of these gender roles fully explain the persistence of the housework gender gap? We show the effects of childhood family size on the housework gender gap across our different samples in the first column of Table 5. We then add one by one the different adult characteristics that are influenced by family size in childhood, in order to appraise them as potential mediators. As revealed in columns (2) to (4), controlling for the employment probability and commuting time of the cohort members, as well as the employment status of their husbands, is sufficient to explain the persistence of the housework gender gap. Women in Panel 2 (i.e. from low-income families) are an exception: the effect of family size persists even when we hold constant these measures of labour-market outcomes. However, this estimated effect is no longer different from zero once marital status is kept constant. These results confirm that the long-lasting influence of family size in childhood on the housework gender gap is mostly due to the adoption of behaviours that conform to traditional gender roles. 12 11 One may worry that the significance of our estimates in Table 4 is a result of multiple hypothesis testing. In the Table, 16 out of 60 estimated coefficients are statistically significant at least at a 10% level. The probability that 16 or more out of 60 coefficients are significant at the 10% level by chance is only 0.02%. We are also reassured by the fact that in the subsample of regressions based on women who grew up in low-income families -for whom our results appear to be stronger -the probability that 4 or more coefficients are significantly different from zero at the 5% level by chance is 0.22%.
12 Controlling for the other adult outcomes shown in Table 4 does not change the estimated coefficient on family size. Note also that none of the estimates in Panel 3 and 5 were significantly different from zero in the first column and the inclusion of the different channels unsurprisingly does not change them qualitatively. We replicate this analysis for men in Table A10.

Conclusion
In this paper we have assessed the impact of childhood family size on the allocation of household tasks of British Cohort Study cohort members at age 16 and then at age 34. We account for the endogeneity of fertility by exploiting parents' preferences for variety in the sex mix of their offspring, and use the sex composition of the first two children as an instrumental-variable predictor of family size.
We find that family size significantly increases the probability that adolescents contribute to housework at the age of 16. However, we show that our estimates differ substantially by gender: only girls do more housework as the family size increases. This finding is not sensitive to the measurement of housework, and girls also spend relatively less time on leisure and homework in larger families. There is also heterogeneity by household income, as girls from low-income families do more housework as family size rises, but not those in richer households. This is consistent with richer parents being more likely to outsource housework and less likely to ask their children to help with the chores. We also find that the effect of family size on housework at age 16 is larger for girls whose mothers hold conservative attitudes.
The effect of family size in childhood is persistent: at age 34, female cohort members who grew up in large families are more likely to sort into couples in which the housework gender gap is significantly larger with respect to women from smaller families. We again find that women from low-income families and with conservative mothers are behind this finding. We show that the long-term effect of childhood family size is explained by the adoption of behaviours that are in line with more conservative gender roles. First, women who grew up in large families are less likely to be employed, and when they are employed their commuting time is significantly shorter. They are also more likely to be married to employed partners who, in return, have less time to spend on household chores.
What is then the role of public policy given our results. In a model of identity formationà la Akerlof and Kranton (2000Kranton ( , 2010, it can be argued that women who grew up in large families maximise their utility by respecting the behavioural prescriptions of the traditional gender attitudes into which they were socialised. If this were the case, women would find it fair (or, at least, not sub-optimal) to do more housework than their male partners and there would be no direct cost in terms of welfare (Lepinteur et al., 2016;Flèche et al., 2018). However, identity can be seen as a narrative, and as such can be interpreted as a flexible concept (Sveningsson and Alvesson, 2003;Ashforth, 2000). More specifically, it can adapt to act as a buffer against adverse life events (as in Ibarra, 2003, where changes in working identity are seen as a coping mechanism for unexpected changes in employment status).
Following the same line of thought, one may suggest that girls who grew up in larger families are more likely to adopt a conservative gender identity, in order to rationalise the fact that they are asked to contribute more to chores as the housework load increases. We have shown that these girls perform significantly more housework than their partners when they turn 34 and have worse labour market outcomes: conservative identities of women who grew up in large families, which can partly form as a childhood coping mechanism, have then the potential to develop into a set of constraining norms as in Collier (2016). This is in line with the literature showing that women in charge of most the housework load have limited opportunities for career and skills enhancements (Hirsch, 2005;Manning and Petrongolo, 2008;Russo and Hassink, 2008;Evertsson, 2013). Additionally, as argued by Mandel and Semyonov (2005) and Pettit and Hook (2005), conservative norms have the power to institutionalise economic inequality between women and men.
While identity and attitudes cannot realistically be changed in the short to medium run, we can arguably foresee a role for policies to intervene in alleviating such constraints for low-income families.
As an example, rendering the outsourcing of housework easier might help to sustain fertility without aggravating the housework gender gap. Further evidence is needed for the precise evaluation of the potential welfare losses associated with the persistence of the housework gender gap.  Notes: Robust standard errors in parentheses. "Family size" indicates the number of siblings of the cohort member at age 16. All regressions control for the ethnicity of the cohort member, birth order dummies, the cognitive and non-cognitive skills of the cohort member at age 16, a dummy indicating whether the cohort member's parents are still living in the same household, family income dummies, years of education of the cohort member's parents, age of the parents at birth of the cohort member, an index measuring the attitude of the mother regarding maternal employment, a dummy indicating whether the gender composition of the siblings is balanced and regional dummies. Family size is instrumented by a dummy equal one if the first two children in the household are of the same sex. Statistical significance is coded following the standard notation: * * * if the p-value is lower than 0.01, * * if the p-value is lower than 0.05, * if the p-value is lower than 0.01. Notes: Robust standard errors in parentheses. "Family size" indicates the number of siblings of the cohort member at age 16. All regressions control for the ethnicity of the cohort member, birth order dummies, the cognitive and non-cognitive skills of the cohort member at age 16, a dummy indicating whether the cohort member's parents are still living in the same household, family income dummies, years of education of the cohort member's parents, age of the parents at birth of the cohort member, an index measuring the attitude of the mother regarding maternal employment, a dummy indicating whether the gender composition of the siblings is balanced and regional dummies. Family size is instrumented by a dummy equal one if the first two children in the household are of the same sex. Statistical significance is coded following the standard notation: * * * if the p-value is lower than 0.01, * * if the p-value is lower than 0.05, * if the p-value is lower than 0.01.  Notes: Robust standard errors in parentheses. "Family size" indicates the number of siblings of the cohort member at age 16. All regressions control for the ethnicity of the cohort member, birth order dummies, the cognitive and non-cognitive skills of the cohort member at age 16, a dummy indicating whether the cohort member's parents are still living in the same household, family income dummies, years of education of the cohort member's parents, age of the parents at birth of the cohort member, an index measuring the attitude of the mother regarding maternal employment, a dummy indicating whether the gender composition of the siblings is balanced and regional dummies. Family size is instrumented by a dummy equal one if the first two children in the household are of the same sex. Statistical significance is coded following the standard notation: * * * if the p-value is lower than 0.01, * * if the p-value is lower than 0.05, * if the p-value is lower than 0.01. (the number of siblings of the cohort member at age 16) for different dependent variables. All regressions control for the ethnicity of the cohort member, birth order dummies, the cognitive and non-cognitive skills of the cohort member at age 16, a dummy indicating whether the cohort member's parents are still living in the same household, family income dummies, years of education of the cohort member's parents, age of the parents at birth of the cohort member, an index measuring the attitude of the mother regarding maternal employment, a dummy indicating whether the gender composition of the siblings is balanced and regional dummies. Family size is instrumented by a dummy equal one if the first two children in the household are of the same sex. Statistical significance is coded following the standard notation: * * * if the p-value is lower than 0.01, * * if the p-value is lower than 0.05, * if the p-value is lower than 0.01. The regressions based on these outcomes are based on a subsample of employed cohort members. Results for these outcomes are similar when including also individuals who are not employed and conditioning on employment. Notes: Robust standard errors in parentheses. The Table reports 2SLS estimates of the coefficient for "Family size" (the number of siblings of the cohort member at age 16) under different sample specifications. All regressions control for the ethnicity of the cohort member, birth order dummies, the cognitive and non-cognitive skills of the cohort member at age 16, a dummy indicating whether the cohort member's parents are still living in the same household, family income dummies, years of education of the cohort member's parents, age of the parents at birth of the cohort member, an index measuring the attitude of the mother regarding maternal employment, a dummy indicating whether the gender composition of the siblings is balanced and regional dummies. Family size is instrumented by a dummy equal one if the first two children in the household are of the same sex. Statistical significance is coded following the standard notation: * * * if the p-value is lower than 0.01, * * if the p-value is lower than 0.05, * if the p-value is lower than 0.01.  Notes: The columns labeled "(1)" refer to the estimation sample at age 16. The column labeled "(2)" refers to the sample of cohort members living in households with at least two children but with missing information on household tasks. The column with label "(3)" instead refers to the overall BCS population with non-missing information on the covariates shown in the table. Columns "(1)-(2)" and "(1)-(3)" refer respectively to the differences in means between column (1) and column (2) and between column (1) and column (3). Standard deviations are in square brackets, while standard errors are reported in parentheses. Statistical significance is coded following the standard notation: * * * if the p-value is lower than 0.01, * * if the p-value is lower than 0.05, * if the p-value is lower than 0.01.  Notes: Robust standard errors in parentheses. The Table reports 2SLS estimates of the coefficient for "Family size" (the number of siblings of the cohort member at age 16) under different definitions of the dependent variable. All regressions control for the ethnicity of the cohort member, birth order dummies, the cognitive and non-cognitive skills of the cohort member at age 16, a dummy indicating whether the cohort member's parents are still living in the same household, family income dummies, years of education of the cohort member's parents, age of the parents at birth of the cohort member, an index measuring the attitude of the mother regarding maternal employment, a dummy indicating whether the gender composition of the siblings is balanced and regional dummies. Family size is instrumented by a dummy equal one if the first two children in the household are of the same sex. Statistical significance is coded following the standard notation: * * * if the p-value is lower than 0.01, * * if the p-value is lower than 0.05, * if the p-value is lower than 0.01. The lowest robust F-statistics is 24.4 and belongs to the 2SLS baseline regression with at least 10 reported tasks, for the boys sub-sample. Notes: Each household task is reduced to a dummy equal one if the task is performed regularly, zero otherwise. Standard deviations are in square brackets, while standard errors are reported in parentheses. Statistical significance is coded following the standard notation: * * * if the p-value is lower than 0.01, * * if the p-value is lower than 0.05, * if the p-value is lower than 0.01. Notes: Robust standard errors in parentheses. "Family size" indicates the number of siblings of the cohort member at age 16. All regressions control for the ethnicity of the cohort member, birth order dummies, the cognitive and non-cognitive skills of the cohort member at age 16, a dummy indicating whether the cohort member's parents are still living in the same household, family income dummies, years of education of the cohort member's parents, age of the parents at birth of the cohort member, an index measuring the attitude of the mother regarding maternal employment, a dummy indicating whether the gender composition of the siblings is balanced and regional dummies. Family size is instrumented by a dummy equal one if the first two children in the household are of the same sex. Statistical significance is coded following the standard notation: * * * if the p-value is lower than 0.01, * * if the p-value is lower than 0.05, * if the p-value is lower than 0.01. Notes: Robust standard errors in parentheses. "Family size" indicates the number of siblings of the cohort member at age 16. All regressions control for the ethnicity of the cohort member, birth order dummies, the cognitive and non-cognitive skills of the cohort member at age 16, a dummy indicating whether the cohort member's parents are still living in the same household, family income dummies, years of education of the cohort member's parents, age of the parents at birth of the cohort member, an index measuring the attitude of the mother regarding maternal employment, a dummy indicating whether the gender composition of the siblings is balanced and regional dummies. Family size is instrumented by a dummy equal one if the first two children in the household are of the same sex. Statistical significance is coded following the standard notation: * * * if the p-value is lower than 0.01, * * if the p-value is lower than 0.05, * if the p-value is lower than 0.01. (the number of siblings of the cohort member at age 16) for different dependent variables. All regressions control for the ethnicity of the cohort member, birth order dummies, the cognitive and non-cognitive skills of the cohort member at age 16, a dummy indicating whether the cohort member's parents are still living in the same household, family income dummies, years of education of the cohort member's parents, age of the parents at birth of the cohort member, an index measuring the attitude of the mother regarding maternal employment, a dummy indicating whether the gender composition of the siblings is balanced and regional dummies. Family size is instrumented by a dummy equal one if the first two children in the household are of the same sex. Statistical significance is coded following the standard notation: * * * if the p-value is lower than 0.01, * * if the p-value is lower than 0.05, * if the p-value is lower than 0.01. The regressions based on these outcomes are based on a subsample of employed cohort members. Results for these outcomes are similar when including also individuals who are not employed and conditioning on employment. Notes: Robust standard errors in parentheses. "Family size" indicates the number of siblings of the cohort member at age 16. All regressions control for the ethnicity of the cohort member, birth order dummies, the cognitive and non-cognitive skills of the cohort member at age 16, a dummy indicating whether the cohort member's parents are still living in the same household, family income dummies, years of education of the cohort member's parents, age of the parents at birth of the cohort member, an index measuring the attitude of the mother regarding maternal employment, a dummy indicating whether the gender composition of the siblings is balanced and regional dummies. Family size is instrumented by a dummy equal one if the first two children in the household are of the same sex. Statistical significance is coded following the standard notation: * * * if the p-value is lower than 0.01, * * if the p-value is lower than 0.05, * if the p-value is lower than 0.01.  Table reports 2SLS estimates of the coefficient for "Family size" (the number of siblings of the cohort member at age 16) under different sample specifications. All regressions control for the ethnicity of the cohort member, birth order dummies, the cognitive and non-cognitive skills of the cohort member at age 16, a dummy indicating whether the cohort member's parents are still living in the same household, family income dummies, years of education of the cohort member's parents, age of the parents at birth of the cohort member, an index measuring the attitude of the mother regarding maternal employment, a dummy indicating whether the gender composition of the siblings is balanced and regional dummies. Family size is instrumented by a dummy equal one if the first two children in the household are of the same sex. Statistical significance is coded following the standard notation: * * * if the p-value is lower than 0.01, * * if the p-value is lower than 0.05, * if the p-value is lower than 0.01.

Appendix A: Additional tables
correlated with the dependent variable through a channel other than family size. While Angrist and Evans (1998) used mothers' labour-force participation as their main outcome, we here consider a childlevel outcome that may well be correlated with the sex composition of the two first-born children. If we assume ex-ante one of our key findings, that is girls contribute more to housework than boys (Gager et al., 1999), the sex composition of the two first-born children in the household may well directly affect all siblings' contribution to housework via a substitution effect: relative to having a brother, a sister will always reduce the residual amount of housework to be carried out. This in turn will affect the cohort member's housework contribution, through a channel other than family size. While we partially take sibling composition into account by including a dummy for there being at least a girl and a boy in the household, we cannot fully rule out this threat to identification. We below present an extensive discussion of the conditions under which the exclusion restriction of our instrument might not hold, a set of tests and an analysis of the plausible exogeneity of the instrument following the approach of Conley et al. (2012). We argue that in most cases the potential violation would bias our estimates towards zero and provide some descriptive evidence against one problematic case.
We first carry out a placebo test. If the instrument were to be correlated with the dependent variable through a channel other than family size, we would expect to find a significant 2SLS familysize estimate even for household tasks that should not be related to family size. We here appeal to the estimates shown in the fourth and fifth rows of Panel B of Table A4. These respectively show the estimated family-size coefficients for household tasks that probably do rise in family size and those that likely do not. None of the 2SLS family-size estimates are significant for the latter (in the fifth row) while most of the estimates in the fourth row are significant. This suggests that the instrument is unlikely to affect the contribution to household tasks other than via impact on family size. 13 We now go one step further and explore whether our instrument is systematically correlated with observable characteristics of the cohort members and their parents at age 16. We follow Falck et al.
(2014) and derive reduced-form estimates from the regression of cohort members' and their parents' characteristics at age 16 (normally used as control variables in our baseline regressions) on the same-sex instrument and all other controls. A significant correlation here would indicate a potential violation of the exclusion restriction. More specifically, if observable characteristics are correlated with our instrument, we may expect that unobservable characteristics are too. Table B1 shows the results for the whole sample and then by the cohort member's gender. Our instrument does not seem to be systematically correlated with the different characteristics reported.
Note that we do not report the correlation between the same-sex instrument and birth order as birth order mechanically rises with family size. Out of the 52 estimates, only eight are significantly different from zero, the majority of which are so only at the 10% level. The only highly significant coefficients are those for mother's age at the birth of the cohort member. While we may worry about this correlation, the estimates are arguably small in economic terms: for example, in column (1) mothers whose two first-born children are of the same sex were born on average six months later than other mothers with at least two children. The absence of selection into consecutive same-sex pregnancies based on observable characteristics mitigates our concerns about systematic correlations with unobservable characteristics.
We can go one step further and allow for direct effects of the instrument on the dependent variable. As argued by Conley et al. (2012), the exclusion restriction is debatable for most instrumental variables, which is why they developed inference methods that are consistent with instruments being only plausibly exogenous. The intuition behind the method in Conley et al. (2012) is to allow the instrument to have a direct effect on the dependent variable in the second-stage regression of 2SLS estimation. We here follow Nybom (2017) and relax the exogeneity of the instrument as follows: F amSize 16 i = α 1 SameSex i + δ 1 X i + i HhT asks 16 i = α 2 F amSize 16 i + γSameSex i + δ 2 X i + µ i where γ, the direct effect of the instrument, varies between -0.05 and 0.05. Note that the interval ranges between values of the same magnitude of the effect of family size in column (2) of Table 1. γ can be interpreted as the share of the reduced-form effect of the instrument that is independent of family size. Our objective here is to identify the threshold at which our main 2SLS estimates become Notes: Robust standard errors in parentheses. "Family size" indicates the number of siblings of the cohort member at age 16. All regressions control for the ethnicity of the cohort member, birth order dummies, the cognitive and non-cognitive skills of the cohort member at age 16, a dummy indicating whether the cohort member's parents are still living in the same household, family income dummies, years of education of the cohort member's parents, age of the parents at birth of the cohort member, an index measuring the attitude of the mother regarding maternal employment, a dummy indicating whether the gender composition of the siblings is balanced and regional dummies. Family size is instrumented by a dummy equal one if the first two children in the household are of the same sex. Statistical significance is coded following the standard notation: * * * if the p-value is lower than 0.01, * * if the p-value is lower than 0.05, * if the p-value is lower than 0.01. The lowest robust F-statistics is 24.4 and belongs to the 2SLS baseline regression with at least 10 reported tasks, for the boys subsample.

49
insignificant at the 10% level. Figure B1 shows the estimates of family size instrumented by the sexcomposition of the first two children in the household, first for the whole sample and then separately for girls and boys. Unsurprisingly, negative values of γ make our 2SLS estimates stronger. However, a positive γ means that the instrument has a direct and positive effect on the contribution to household tasks, that in turn may inflate our main estimates. If γ is greater than 0.01 for the whole sample and greater than 0.02 for girls, the 2SLS estimates of instrumented family size are no longer significantly different from zero. This is equivalent to saying that, as long as the direct effect of the instrument is smaller than roughly 20% of our main 2SLS estimate, the effect of family size on housework will remain significantly different from zero. We now argue that if sibling composition has a direct impact on the contribution to housework, it is unlikely to be positive. Table B2 presents a topology of all possible sibling compositions, given the value of the same-sex instrument. Consider first the case in which the cohort member is the first-or second-born. If the cohort member is a boy, then the instrument Z will take value 0 if he has a sister (case a) and 1 if he has a brother (case b). 14 If we assume that girls contribute more than boys to household tasks, then the cohort member will be more likely to perform a larger share of housework in case b than in case a. Hence, with 2SLS we would tend to overestimate the family-size coefficient due to the positive correlation between the instrument and the dependent variable, given all other covariates. This would be problematic if we found a positive effect of family size on the share of housework performed by boys, as we would not be able to distinguish whether the effect comes from the real association between the variables or the violation of the exclusion restriction. However, since we find no statistically-significant effect of family size on the share of housework for boys, the real effect should be either zero or negative -which in either case corroborates the finding that the effect of family size is larger for girls than for boys. Now consider the case where the cohort member is a first-or second-born girl. The instrument takes value 0 when she has a brother (case c) and 1 when she has a sister (case d). With the same assumption as above, the cohort member will be more likely to contribute more to housework in case c than in case d. Here, conditional on the controls, the instrument would be negatively correlated with the dependent variable. We again are not particularly worried about this potential violation of the exclusion restriction, as it would bias the coefficient of family size towards zero.
We finally consider the case where the cohort member is neither the first-nor the second-born. The instrument now takes the value of one in two occurrences: either the two first-born children are both boys (cases b 2 and d 2 ) or both girls (cases b 1 and d 1 ). Irrespective of his or her gender, a cohort member with two older sisters would tend to perform relatively less housework compared to the case where the instrument is zero (cases a 1 and c 1 ). As argued above, the 2SLS estimate of family size would then be biased toward zero. Instead, when the cohort member has two older brothers (cases b 2 and d 2 ) there will be comparatively more housework to do and he or she might be asked to contribute relatively more than in the case where the two eldest siblings are of opposite sexes. We may here expect the 2SLS estimate to overestimate the effect of family size. This is the most worrying case, as the effect size we estimate is potentially inflated. To check whether the sample of cohort members with this particular sibling mix is behind our results, we replicate our main results from Table 1