Child height, health and human capital: Evidence using genetic markers

Height has long been recognized as being associated with better outcomes: the question is whether this association is causal. We use children's genetic variants as instrumental variables to deal with possible unobserved confounders and examine the effect of child/adolescent height on a wide range of outcomes: academic performance, IQ, self-esteem, depression symptoms and behavioral problems. OLS findings show that taller children have higher IQ, perform better in school, and are less likely to have behavioral problems. The IV results differ: taller girls (but not boys) have better cognitive performance and, in contrast to the OLS, greater height appears to increase behavioral problems.


Introduction
The association between height and wealth has been noted in the academic literature for many decades. As early as the 17th Century, Guarinoni -one of the founders of preventive medicine -pointed to the difference in growth rates between the rich in towns and the poor in the countryside (Tanner, 1982). More recent studies find height to be positively related to education (Magnusson et al., 2006) and income (Persico et al., 2004). The advantages associated with greater height have also been reported for children. For example, Case and Paxson (2008) find that taller children perform better in school tests compared to shorter children and suggest that the relationship between childhood height and income and education in adulthood is due to height being associated with greater intelligence. The next section begins by examining the possible mechanisms through which height may be related to the outcomes of interest. In Section 3, we set out our methodology and Section 4 describes the data. The results are presented in Section 5 and Section 6 concludes. subjected to war-time famine and postnatal, such as starvation in the early years of life) complete equality in height with siblings or peers is attained before puberty (Tanner, 1978).
In terms of the pre-natal diet, there is evidence that nutrition in utero plays an important role in child development. But nutriments which help some developmental aspects, may hurt others. For example, omega 3 fatty acids in fish and seafood consumption are crucial for brain development and have been associated with decreased hostility and aggression (Benton, 2007), but are also the primary source of (non-occupational) mercury exposure (Oken and Bellinger, 2008). Several studies have shown prenatal methylmercury exposure to be associated with decreased IQ and test scores (Axelrad et al., 2007;Cohen et al., 2005). Likewise, some studies find that maternal alcohol consumption and smoking during pregnancy negatively affect birth weight and child growth (Mills et al., 1984;Gilman et al., 2008). Lower birth weights in turn are associated with poorer cognitive performance (Richards et al., 2002;Ericson and Kallen, 1998) and behavioral development (Elgen et al., 2002), though the literature suggests that this relationship is driven by family background characteristics rather than a specific intrauterine effect (Yang et al., 2008). There is mixed evidence on the effects of maternal smoking and alcohol consumption during pregnancy on child outcomes, with some arguing it lowers outcomes and others finding no effect (see e.g. Olds et al., 1994;Gilman et al., 2008;Kafouri et al., 2009;Nilsson, 2008;Russell, 1991).
Other potential confounders include genetic causes of both height and the outcome of interest. This may be especially important in this context, as both height and (for example) cognition are likely be influenced by a large number of genes, each with very small effects. Indeed, some literature suggests that part of the height-intelligence association is driven by a genetic component (Sundet et al., 2005), though others find no evidence of this. For instance, comparing first and second born biological brothers in Sweden, Magnusson et al. (2006) find that the taller brother is significantly more likely to attend higher education. However, the height effect estimated between brothers is almost identical to that across all men, suggesting that the correlation between height and intelligence is not driven solely by genetic or environmental factors common to brothers.
This discussion suggests that a potential bias can go in either direction. If a well-balanced diet or the family's socioeconomic position positively affects height, but also leads to fewer behavioral problems, the OLS is likely to underestimate the true effect of height on behavioral problems. If, however, this same diet leads to better educational outcomes, OLS is likely to over-estimate the true effect on education. However, if certain dietary components lead to decreased cognitive functioning, the OLS may under-estimate the true effect on educational outcomes and IQ. Under the assumptions we discuss in detail below, the use of the child's genetic markers as instrumental variables will shed more light on these issues and will allow us to estimate the causal effect of child height.

The potential outcomes framework
We examine the impact of child height on three sets of outcomes: (1) cognitive skills, (2) mental health, and (3) behavioral problems. We discuss the outcomes in more detail below. As both height and outcomes differ by gender, we estimate all models separately for boys and girls. We model the relationship between height and outcomes using the potential outcomes framework, building on the work by Imbens and Angrist (1994) and Angrist et al. (1996), which has been of great importance in linking the econometric IV literature to the potential outcomes framework.
Let C, H and Z denote random variables representing, respectively, the outcome of interest, child height and the genetic variant as IV. For simplicity, we initially discuss the case of a binary instrument, though we consider the case of multivalued instruments below. Z i ¼1 indicates that individual i carries the genetic variant, Z i ¼ 0 implies that individual i does not carry the genetic variant.
Let H i (z) be the potential height for individual i when the instrument is set equal to z. Equivalently, let C i (h,z) be the potential outcome for individual i that would be obtained if height, the treatment variable, was set to h and the instrument set to z. We refer to H i (z) and C i (h,z) as the potential treatments and potential outcomes respectively.
The individual treatment effect, or causal effect, is C i (h 0 ,z)À C i (h,z), where h is some baseline value. Under the exclusion restriction discussed below, we can write C i (h 0 ,z)¼C i (h 0 ). The causal estimand of interest can therefore be written as We follow Angrist et al. (2000), who specify the conditions under which the simple IV estimator identifies a weighted average of the derivative function of the non-linear causal response function. We discuss these assumptions in turn.

Assumption 1. (Independence)
Independence implies that the instrument is independent of the potential outcome and the potential height, for all values of h and z. In other words, the instrument is as good as randomly assigned.

Assumption 2. (Exclusion)
Exclusion implies that the potential outcomes, at any height h, are unchanged by the presence or absence of the genetic variant. In other words, the only way through which the instrument affects the potential outcome is via H.

Assumption 3. (Nonzero effect of instrument on height)
This implies that expected potential height is affected by the genetic variant and therefore, that the instrument has an effect on treatment. This means that the potential height for individual i with the genetic variant is at least as high as the potential height for the same individual without the genetic variant.
Specifying heterogeneous responses, the potential outcome for individual i can be written as a general function of h, say C i (h) g i (h). Under the assumptions above, the instrumental variables estimand, defined as the ratio of the difference in average outcomes at two values of the instrument to the difference in average treatment at the same two values of the instrument, can be written as where g i 0 ðqÞ is the derivative of g i (h) w.r.t. h evaluated at q. Therefore, the IV estimator is a weighted average of the derivative function (Angrist et al., 2000;Angrist and Pischke, 2009).
Although the above discussion uses a binary instrumental variable, we observe a multi-valued instrument. In the case of such discrete instruments, the IV estimate is a weighted average of the average causal derivatives calculated at each value of the instrument, where the weights are determined by the strength of the instrument on the treatment. Hence, the IV estimate is a weighted average of the derivative function at the different values of the instrumental variable (Angrist et al., 2000).

The genetic variants
We use a set of nine genetic variants (single-nucleotide polymorphisms: SNPs (see glossary, Table 1, and the Appendix)) that have all been robustly associated with height among individuals of European ancestry. The nine variants we use are SNPs in the following genes: HMGA2 (rs1042725), ZBTB38 (rs6440003), GDF5 (rs6060373), LOC387103 (rs4549631), EFEMP1 (rs3791675), SCMH1 (rs6686842), ADAMTSL3 (rs10906982), DYM (rs8099594) and C6orf106 (rs2814993), where the rs number is a unique SNP identifier. 4 Mendelian randomization is valid assuming that, at the population level, the genetic variants are unrelated to the type of unmeasured lifestyle and socio-economic confounders that tend to distort interpretations of observational studies. The theory of random allocation of genetic variants and the empirical evidence on this suggest this is the case (Bhatti et al., 2005;Kivimäki et al., 2008;Lawlor et al., 2008;see also Fisher, 1952;Box, 2010;Bodmer, 2010). We discuss the assumptions in turn, relating this to our research question. 5

Assumption 1: Independence
One way to indirectly test Assumption 1 is by exploring whether the distribution of individual or family-level characteristics that are available in the data is the same in different groups defined by the value of the instrument. In Section 4.4, we examine the relationship between the genetic variants and a large set of child and family background characteristics. The idea is that, if the instrumental variable is indeed randomized, there should be no systematic variation in the covariates by genotype. This raises the question however, about which covariates to test for, as any characteristic is, in principle, a post-treatment variable with respect to the instrument. Hence, any systematic variation in these indirect 4 See e.g. Weedon et al. (2007Weedon et al. ( , 2008, Lettre et al. (2008) Gudbjartsson et al. (2008), and Allen et al. (2010). For example, Weedon et al. (2008) identify 20 loci that robustly affect stature, including those used here, using a total of 30,147 individuals of European ancestry. These have since been confirmed in more independent samples (see e.g. Weedon et al., 2007;Lettre et al., 2008;Gudbjartsson et al., 2008;Allen et al., 2010). We use nine of the 20 SNPs identified by Weedon et al. (2008), as these were the only variants available in our data at the time of writing. 5 For a more detailed discussion of the use of genetic markers as instrumental variables from an economic perspective using a similar framework as the above, see von Hinke Kessler Scholder et al. (2011b). Lawlor et al. (2008) includes a more general discussion of the situations and (biological) processes that may invalidate Mendelian randomization studies.
tests does not necessarily indicate a violation of independence (or exclusion). It may be, for example, that the instrument is picking up additional causal effects of the same risk factor, or that it is picking up reverse causation from the outcome to a different covariate. One way through which the independence assumption can be violated is population stratification. This refers to a situation in which there is a systematic relationship between the allele frequency and the outcome in different population subgroups (see Table 1 and the Appendix for a definition of some genetic terms). For example, allele frequencies can vary across ethnic groups. If these groups also have systematically different educational outcomes that are not due to a genetic make-up, this could lead to an association between the two at the population level without an actual causal relationship, violating the independence Assumption 1. In other words, despite the fact that genotypes are randomly allocated and with that satisfy Independence, any population stratification can violate this assumption. This can be dealt with however, by examining the question of interest within ethnic groups, separately analyzing the different sub-populations, and/or adjusting for principal components from genome wide data that function as ancestry markers, relying on the conditional independence assumption. Population stratification is unlikely to affect our estimation, as our cohort is recruited from a specific geographically defined region, and fewer than 3% of the mothers reported that either they or their partner were from an ethnicity other than White European. With this small number of participants removed, a principal components analysis using genome-wide data in the cohort suggests that it consists of one population.

Assumption 2: Exclusion
There are various situations that can violate the exclusion restriction. First, as individuals inherit their genes from their parents, it may be important to consider whether parents' behaviors are affected by their genotype (and hence are related to their offspring's genotype). In the presence of strong 'dynastic effects', genetic instruments may be invalid if they are related to parental behaviors that in turn affect the outcome of interest (Fletcher, 2011). For example, parents who carry 'tall' alleles may be treated differently because of their taller stature. If this affects their preferences for their child's education, Assumption 2 may be violated. The extent of this potential violation however, will depend on the effect sizes of the variants. In our case, the genetic variants increase the average height by a relatively modest amount, which is unlikely to lead to strong (parental) responses.
Second, if the variants have multiple functions (also known as pleiotropy), Assumption 2 could be violated. This would occur for example, if -over and above the association with height -the variant has a direct effect on our outcome of interest (such as cognition or self-esteem), violating the exclusion restriction. Similarly, if a variant is co-inherited with another genetic variant (known as being in linkage disequilibrium (LD)), violation of Assumption 2 depends on the effect of the co-inherited variant on the outcome of interest. The current evidence suggests that some height variants may indeed Table 1 A glossary of some genetic terms.

Alleles
One of two or more versions of a specific location on the DNA sequence. An individual has two alleles, one from each parent Base Also called nucleotide. Bases are the 'building blocks' of DNA. DNA consists of four bases: adenine (A), cytosine (C), guanine (G) and thymine (T). It is the sequence of these four bases that encodes information Chromosome A continuous piece of DNA that carries a collection of genes. Every cell in the human body contains 46 chromosomes DNA Deoxyribonucleic acid (DNA) contains the genetic instructions used in the development and functioning of all living organisms. The DNA segments that carry the genetic information are called genes. The double-helix structure joins two strands of DNA, where the base A binds with T, and G binds with C Gene A section on the chromosome that comprises a stretch of DNA Genotype The specific set of two alleles inherited at a particular location on the DNA sequence. If the alleles are the same, the genotype is homozygous. If different, it is heterozygous Homozygous When the two alleles at a particular locus are the same Heterozygous When the two alleles at a particular locus are different Heritability The proportion of the total variance that is explained by genetic factors. It is most commonly calculated from twin studies by comparing intra-pair correlations for the characteristic in monozygotic (MZ) with intra-pair correlation in dizygotic (DZ) twins. The heritability is of a characteristic is calculated as twice the difference between MZ and DZ intra-pair correlations (h 2 ¼ 2n(r MZ À r DZ )) Linkage disequilibrium (LD) The correlation between alleles at different loci within the population that occurs due to the co-inheritance of alleles. Alleles that are in LD are not independent of another. The extent of LD is a function of the distance between the alleles on the chromosome Phenotype An organism's observable characteristic or trait, such as its biochemical or physiological properties. Phenotypes result from the expression of genes as well as the influence of environmental factors and the interaction between the two Pleiotropy The potential for variants to have more than one phenotypic effect. If a SNP is pleiotropic, it influences multiple phenotypes Polymorphism Locations where DNA varies between individuals Population stratification The presence of a systematic difference in allele frequencies between subpopulations within a population. The most common example is population stratification due to ethnicity Single-nucleotide polymorphism (SNP) A genetic variation in which a single base/nucleotide on the DNA is altered, e.g. the nucleotide T is changed to A be pleiotropic or in LD (i.e. co-inherited) with other variants. For example, individuals with higher levels of GDF5 on average have both increased bone and cartilage growth (Sanna et al., 2008). However, there is currently no evidence that the variants used here additionally directly affect (or are in LD with variants that directly affect) our outcomes of interest or determinants thereof.
We investigate the potential violation of the IV assumptions in a number of ways. First, we search the literature to identify evidence on the biological pathways of our variants, which may shed more light on the mechanisms through which they affect height. Medical and theoretical evidence that suggest that the SNPs only affect the outcome through their effect on height would in turn mitigate concerns about the exclusion restriction. Although the biological pathways are not known for all variants, Allen et al. (2010) show that a substantial number of the 180 SNPs they study, including some used here, are involved in growth-related processes. 6 Despite the absence of evidence of our SNPs directly affecting (determinants of) the outcomes of interest, and despite the biological pathways pointing to skeletal development and cell growth, we cannot guarantee that Assumption 2 holds. For instance, it is possible that some variants' pleiotropic effects (i.e. any additional effects independent from those on height) have simply not yet been identified. The 180 SNPs that have so far been identified explain 10% of the total variation in height. Hundreds, maybe thousands more effects are still lost in the genome (McEvoy and Visscher, 2009). Hence, it is possible for one (or more) of the nine instruments used here to be pleiotropic or in LD with a variant that directly affects our outcome. Based on the best available evidence however, we assume this is not the case and that Assumption 2 holds. 7 We reiterate though, that -similar to any other IV approach -this remains an assumption, as we cannot test for this directly. In other words, its validity will never be known with complete certainty and can only be examined indirectly or falsified by the data.
When data are available on a large number of variants affecting the risk factor of interest, genetic confounding through pleiotropy or LD can be examined in more detail. More specifically, if multiple IV models -each using different independent combinations of these variants -predict a similar causal effect, this is very unlikely to be due to some common pleiotropy or LD across the different sets of variants, assuming that the different variants are located on different chromosomes and affect the trait via different pathways (Davey Smith, 2011;Palmer et al., 2011). Hence, if the different IV specifications display consistency, it provides some evidence against genetic confounding. One would ideally have a large number of variants available to thoroughly test for this, allowing for many different combinations of instrument sets without having to deal with weak instruments. Although the genetic data available to us is more limited, we explore this concept and investigate this further in Section 5.3.

Assumption 3: Nonzero effect of instrument on height
The prior knowledge on the effects of the variants, our use of a comparable sample of individuals of European ancestry, and the fact that these associations have been replicated in different independent samples, justify the use of these variants and their compliance with Assumption 3. However, as gene-environment interactions in different samples can violate this Assumption (see e.g. von Hinke Kessler Scholder et al., 2011a), Section 5.2 examines the strength of the instrument in our sample, using the standard statistical tests. Although the relationships between the SNPs and height are robust, their phenotypic effects (the actual effects on height) are small. In our analysis, we therefore combine the different SNPs into a count of the number of 'tall' alleles carried by each child to get around the problem of low power. We create a count of the total number of height-increasing alleles for each child and use this as the instrumental variable for height (see Section 4.4).

Assumption 4: Monotonicity
Given random allocation of genetic variants and the fact that individuals do not know their genotypes, we assume that an individual who carries a 'tall' allele is at least as tall as the same individual, had she not carried the 'tall' allele, thus satisfying the monotonicity Assumption 4. As this relies on knowing each individual's counterfactual, this remains an assumption. The literature only shows that, at a group or population level, those who possess the genetic variant are taller than those who do not. The assumption could, for example, be violated in the presence of gene-environment interactions, though we are not aware of any evidence of this for the SNPs used here.

Data
We use data from a cohort of children born in the Avon area of England. Avon has approximately 1 million inhabitants, including 0.5 million in its main city, Bristol. Women eligible for enrollment in the population-based Avon Longitudinal Study of Parents and Children (ALSPAC) had an expected delivery date between 1 April 1991 and 31 December 1992. Approximately 85% of these mothers enrolled, leading to about 14,000 pregnancies. The Avon area is broadly representative of the UK, though mothers were slightly more affluent compared to the general population (Golding et al., 2001; see www.bris.ac.uk/alspac for a more detailed description of the sample, its enrollment, and response rates). Note that ALSPAC is a cohort; there is no systematic data collection on siblings.
Detailed information on the children and their families has been collected from a variety of sources, including selfcompleted questionnaires, data extraction from medical and educational records, in-depth interviews, and clinical assessments and so our data contain a large range of child health and development, family background, family inputs and school measures.
A total of 12,620 children survived past the age of 1 and returned at least one questionnaire. Of these, 642 were excluded because either their mother or father is of non-white ethnic origin, leaving 11,978 potential participants. Our sample selection process is as follows. First, we select those children for whom we observe all nine genotypes, leaving us with approximately 7100 children. Second, we drop children for whom we do not observe their height. Children were invited to attend specially designed clinics, where their anthropometric measures were recorded. As not all children attended these clinics, our sample sizes reduce to between 4594 (age 8) and 3867 (age 13). Finally, we restrict the sample to those children for whom we observe the outcome of interest, leading to a final sample size of around 3900 at age 8 and 3300 at age 13. We deal with missing values on other covariates by using multivariate imputation (Royston, 2004).

Outcome measures
We examine three sets of outcomes. First, we observe two measures of cognitive function. These are the child's score on the nationally set Key Stage 3 (KS3) exam (taken by all 14-year-olds educated in the state sector) and the child's IQ, measured as age 8. 8 Both measures are objective and comparable across all children. Increasing scores indicate better performance. It is important to note that IQ does not only measure 'innate' ability. Instead, our measure of IQ (WISC-III) is an index of general intellectual functioning, which is shaped by both inherited and acquired attributes, including any family and environmental influences. For example, there is evidence of differences in IQ between children of different quality home environments and socio-economic position (see e.g. Molfese et al. (1997) and references therein).
Second, we examine three measures of mental health or self-esteem: depression symptoms, scholastic competence and global self-worth. The latter two are measured at age 8, using the Harter's Self-Perception Profile for Children (Harter, 1985), with increasing scores indicating higher self-esteem. The depression score is self-reported by the teenager at age 13 using the Moods and Feelings Questionnaire (Angold et al., 1995). Increasing scores indicate more depression symptoms.
Third, we examine the child's behavioral problems, as measured by the mother's report on the Strength and Difficulties Questionnaire (SDQ; Goodman, 1997) administered at age 13. SDQ has four sub-scores, which we examine separately (as is common in the literature). These are hyperactivity, emotional problems, conduct problems and peer problems. Increasing scores indicate increasing problems.
For comparability, all outcomes are standardized on the full sample of children for whom data is available, with mean 100, standard deviation 10.

Measures of child height and the genetic variants
We examine the effect of contemporaneous height on each outcome. Height is adjusted for the exact age in month at which it is measured and standardized to have mean 100, standard deviation 10. All measurements are taken by trained nurses. We instrument height with a set of SNPs that have been consistently shown to relate to stature. These are SNPs located in the following genes: HMGA2, ZBTB38, GDF5, LOC387103, EFEMP1, SCMH1, ADAMTSL3, DYM and C6orf106. All but two SNPs are located on different chromosomes (LOC387103 and C6orf106 are both on chromosome 6), with the correlation in our sample ranging from À 0.029 to 0.026. Hence, each SNP has an independent effect on stature.

Covariates
The main reason for the inclusion of covariates in economics IV studies is that the conditional independence and exclusion restriction are more likely to be valid. A second reason for including covariates is that it may reduce the variability in the dependent variable, leading to more precise estimates. In Mendelian randomization studies, however, the theory and evidence on the random allocation of genetic variants suggests that we can rely on the unconditional independence and exclusion restriction. In fact, the inclusion of covariates may bias the estimates of interest. For example, if the instrumented risk factor (here: height) has multiple causal effects, or if the outcome of interest has a causal effect of its own on the covariates, adjusting for such post-treatment variables may lead to biased estimates of the causal effect of interest. Under the independence assumption and exclusion restriction, and in a situation where the instrumented risk factor and outcome do not (directly or indirectly) affect these covariates, the unadjusted and adjusted IV estimates should be similar, though the latter may be more precise. We present the main findings both with and without adjustment for covariates. These show similar results, providing at least suggestive evidence that the instruments satisfy independence and exclusion.
In the analysis that adjusts for covariates, we control for a rich set of child and family characteristics, including the child's birth weight and the number of older and younger siblings under 18 in the household. As the outcomes of interest may vary with within-year-age, we also account for the child's age (in months) at the time the outcome is measured. We control for the family's socio-economic position with various measures: log equivalized family income and its square, four binary variables for mother's and father's educational level, the mother's parents' educational level, an indicator for whether the child is raised by the natural father, variables indicating the family's social class, and parents' employment status when the child is 21 months. As a further measure, we include a measure of small (local) area deprivation, as measured at the child's birth. 9 In addition to these generally observed controls, our data allow us to also account for several further measures of mother's health and behavior, which may be correlated with both child height and the outcome of interest. We use two binary variables which measure whether the mother smoked or drank alcohol in the first three months of pregnancy; an ordered indicator for the intensity of mother's breastfeeding (never, o1 month, 1-3 months and 3þ months); mother's age at birth (20-24, 25-29, 30-34, 35 þ); mother's 'locus of control', a psychological concept that describes whether individuals attribute successes and failures to internal or external causes (those with an external locus of control attribute success and failure to chance); two further measures of maternal mental health; and finally several measures of parental involvement or interest in the child's development. 10

Descriptive statistics
Table 2 presents mean height (at age 8) for each of the SNPs, distinguishing between children who are homozygous for the height-increasing allele, heterozygous and homozygous for the height non-increasing allele (see the glossary in Table 1 and the Appendix for some of the genetic terms used here). These show that each of the individual SNPs explain little of the variation in child height. This would imply that the first stage regressions have low explanatory power, which could result in biased estimates. To avoid such problems of low power, we create a count of the total number of heightincreasing alleles carried by each child (as in e.g. Weedon et al. (2008); Lettre et al. (2008)). We use this in our main analysis as the instrument for child height. As shown by Pierce et al. (2010), combining genetic factors as such alleviates weak IV problems. However, they also show that such counts are mainly appropriate when variants have similar effects, but suboptimal otherwise, as the effect sizes will be mis-specified. Indeed, a simple count of the number of risk alleles imposes structure, setting the magnitude of the effects of all alleles to be equal. As an alternative, we therefore check the robustness of our results in Section 5.3, using a weighted allele score, where the weights are the gender-specific strengths of the association between the variant and individual height, as estimated by a large genome-wide association study of 183,727 individuals in 61 independent datasets (Allen et al., 2010). In this section, we also investigate the robustness of our results to the use of different combinations of different sets of instruments.
The left panel of Fig. 1 presents a histogram of the number of 'tall' alleles carried by each child, showing a bell-shaped distribution. The linear prediction of height, obtained from a regression on the number of 'tall' alleles, is presented by the straight line. On average, each 'tall' allele increases the child's height at age 8 by 0.043 standard deviations (about 0.25 cm). There is, however, a considerable amount of unexplained variation in height (R 2 o1%), as shown in the right panel of Fig. 1, where the linear prediction is presented by the same straight line.
Columns 1 and 2 in Table 3 present the descriptive statistics (mean, standard deviation) of the variables discussed above. This shows an average height at age 8 of 132.2 cm and of 163.3 cm at age 13. In the analysis, we use standardized heights. Columns 3-5 show the raw association between this measure, the covariates and the number of 'tall' alleles, obtained from a regression of standardized height or each covariate on the number of height-increasing alleles. The top two rows of these columns present the relationship between child height and the instrument, showing a strong 9 Family income is an average of two observations (when the child is aged 3 and 4) and is in 1995 prices. The educational indicators are: less than ordinary (O) level, O-level only, advanced (A) level that permits higher educational study, and having a university degree. We use the standard UK classification of social class based on occupation (professional (I), managerial and technical (II), non-manual skilled (IIInm), manual skilled (IIIm), semiskilled (IV) and unskilled (V)). The Index of Multiple Deprivation (IMD) is based on six deprivation domains, including health deprivation and disability; employment; income; education, skills and training; housing; and geographical barriers to services. Increasing IMD scores indicate greater deprivation. The IMD measure relates to areas containing around 8000 persons. 10 Maternal mental health is measured by the Edinburgh Post-natal Depression Score (EPDS) and Crown-Crisp Experimental Index (CCEI) at 18 weeks gestation. EPDS indicates the extent of post-natal depression; CCEI captures a broader definition of mental health, measuring general anxiety, depression and somaticism. Higher scores mean the mother is more affected. The mother's 'teaching score' is constructed from questions that measure whether the mother is involved in teaching her child (depending on the child's age) songs, the alphabet, being polite, etc. We use an average score from three measures at ages 18, 30 and 42 months to capture longer-term involvement. Likewise, a variable is included indicating whether the mother reads/sings to the child, allows the child to build towers/other creations etc., measured at age 24 months. Finally, we account for the extent to which parents engage in active (outdoor) activities with their children, such as going to the park or playground and going swimming.
relationship for height at both ages. On average, each 'tall' allele is associated with a 0.043-0.047 standard deviation increase in child height (recall from above that height is distributed with mean 100, standard deviation 10). The rest of columns (3-5) show no clear patterns or (with three exceptions) statistically significant associations in the relationship between the contextual variables and the number of height-increasing alleles. Using a two-sided binomial probability test at the 5% level, a comparison of the observed versus expected number of significant correlations suggests that the genetic variants show no greater association with the child and family background characteristics than what would be expected by chance (p¼0.15). Failing to reject the null, however, does not necessarily imply it is true. In other words, it does not guarantee that the instrument is orthogonal to any potential confounders, as it may be that the association is too small to detect with our sample size, or that we simply do not observe the relevant confounders. Nevertheless, it provides suggestive evidence that the instruments support Assumptions 1 and 2. 11

IV falsification check
Another way to examine the robustness of our IV approach and the validity of our instruments is by undertaking a 'falsification check'. We do this in two ways. First, we examine the effect of height on an outcome for which we have clear theoretical reasoning that there should not be an effect. Second, we examine the effect of height on an outcome for which we have strong beliefs that there should be an effect. These approaches, also known in epidemiology as 'negative control' and 'positive control' methodology respectively, are increasingly adopted in the biomedical field (see e.g. Lipsitch et al., 2010). In the first test, we investigate the relationship between children's height and maternal educational level in an OLS and IV analysis. With evidence of a socio-economic gradient in height, we expect a positive association. However, there is no reason to believe there to be a causal effect, and hence, we expect the IV approach to remove this correlation.
Columns 1 and 2 of Table 4 present the results, showing strong positive correlations between maternal education and height in the OLS, which turn insignificant in the IV model. The IV point estimates are sometimes smaller and sometimes  11 To shed more light on whether the variants are likely to be related to other background characteristics, we also examine the relationship between the genetic variants and a wide set of further variables (64 additional pairwise comparisons) that are not included in our analysis (such as whether the child had sleeping difficulties, the child's 'locus of control', whether the mother had a cesarean section, mother's self-esteem, anxiety, depression, whether the family owns their own home, whether they have financial difficulties, etc.). The findings (available from the authors upon request) also suggest the genetic variants are unrelated to these other variables (using a two-sided binomial probability test, p ¼ 0.77 at the 5% level). larger, with no clear patterns in size or sign of the effects of height measured at different ages. As expected, the standard errors are much larger in the IV, and we cannot reject the null of no effect. The large standard errors however, also preclude us from rejecting the Durbin-Wu-Hausman (DWH) test, suggesting that we cannot distinguish the IV estimates from the OLS estimates.
In the second falsification check, we examine the effect of height on body weight. As these are highly (positively) correlated, particularly in children who are still growing (e.g. see any children's growth charts), we expect to find strong positive effects. Assuming that height is exogenous to body weight, we also expect the OLS and IV estimates to be similar, though the exogeneity of height in this setting is an assumption. 12 However, as shown by Tanner (1978) and discussed above, even with severe (prenatal or postnatal) malnutrition, children attain similar heights as their siblings or peers. Hence, assuming that height is exogenous to weight, a substantially different or null IV finding would cast doubt on our IV strategy.
Columns 3 and 4 of Table 4 show strong positive estimates of height on body weight at different ages in both the OLS and IV. A one standard deviation increase in height is associated with a 0.52-0.70 standard deviation increase in weight in the OLS, and a 0.21-0.93 standard deviation increase in weight in the IV. The point estimates are similar in both models, though the standard errors are again much larger in the IV. The Durbin-Wu-Hausman test shows that the majority of the IV estimates are indistinguishable from those estimated by OLS.
Despite the imprecision of the IV approach, the two tests suggest that our instruments perform well. Although this does not guarantee that our IV approach also correctly identifies the causal effect on the other outcomes of interest such as depression or behavior, it does provide support for the argument that both the approach and the instruments are valid to obtain causal estimates of the effects of stature. In Section 5.3, we examine the robustness of these estimates to the use of different combinations of instrumental variables. Table 3 Descriptive statistics of height and the covariates: Columns 1 and 2 show their mean and standard deviation. Columns 3-5 present the coefficients, standard error and p-value of the variables shown in the first column regressed on the instrument (a count of the number of height-increasing alleles).
(5) p-value Note: Rather than height in cm, the analysis uses standardized heights (with mean 100, standard deviation 10).
12 If a healthy (unobserved) diet positively affects height and negatively affects weight, the OLS estimates would be biased downwards.

OLS results
We begin by examining the OLS association between height, cognitive skills and mental health. Columns 1 and 2 of Table 5 show a positive association between height, test scores and IQ that halves when controlling for the background characteristics. The actual magnitude of the association is small: controlling for all covariates (the 'adjusted' results), a one standard deviation increase in height is associated, for example, with a 0.057 standard deviation increase in girls' IQ. Comparing this to the effect of within-school-year age on IQ in our data, this corresponds to a difference in test scores between children born approximately one month apart.
Columns 3-5 examine the relationship between height, the two measures of self-esteem and symptoms of depression. This shows that height is correlated with increases in self-esteem and depression scores, though the estimates are small and generally indistinguishable from the null (the positive association with depression symptoms for girls is the one exception). Table 6 presents both the unadjusted and adjusted associations between height and behavioral problems. These show that height is unrelated to hyperactivity and conduct problems, but there is a negative correlation with emotional problems. The effects are again small: a one standard deviation increase in height is associated with 0.06-0.07 standard deviations decrease in emotional problems. The results also show a small negative association between height and peer problems for girls. Table 7 presents the IV results for cognitive skills and mental health. The unadjusted and adjusted analyses lead to similar conclusions (as expected, since Table 3 showed the instruments to be generally uncorrelated to the covariates). Our  Notes: The estimates come from OLS regressions of the outcome on contemporaneous height by gender; The adjusted analysis includes controls for: birth weight, age in months, number of older and younger siblings, log family income and its square, mother's -, father's -, and mother's parents' educational level, raised by natural father, social class, maternal age at birth, parents' employment status, IMD at birth, mother's smoking and drinking during pregnancy, breastfeeding, mother's 'locus of control' and mental health (EPDS and CCEI), parental involvement in child development, and their engagement in active activities with their child. n p o 0.1. nn p o 0.05. nnn p o0.01. Table 6 OLS-The unadjusted and adjusted effects of contemporaneous height on behavior at age 13.

IV results
(1) Hyperactivity (2)  Notes: The estimates come from OLS regressions of the outcome on contemporaneous height by gender; Controls are listed in the note to Table 5. n p o 0.1. nn p o 0.05. nnn p o0.01. Table 7 IV-The effects of contemporaneous height, instrumented by a count of the number of risk alleles, on cognitive skills and mental health.
(1) Key Stage 3, Age 14 (2) IQ test score, Age 8 (3) Scholastic self-esteem, Age 8 (4) Global self-worth, Age 8 instrument predicts height well in all specifications, with a first stage F-statistic between 19 and 34 for boys, and 11 and 18 for girls, satisfying Assumption 3. 13 Columns 1 and 2 show the IV estimates for KS3 and IQ respectively. These are positive for girls, but indistinguishable from zero for boys. For girls, instrumented height has a large positive effect on both KS3 and IQ, and we reject the Durbin-Wu-Hausman (DWH) test. Despite the much larger standard errors, the IV estimate for girls is larger than the OLS, suggesting that the latter underestimates the true effect. We discuss possible reasons for this below.
Columns 3-5 of Table 7 show that for self-esteem, global self-worth and depression symptoms, the large standard errors mean we cannot reject the null of no effect, though in contrast to the OLS estimates, all three sets of IV coefficients relate increasing height to worse outcomes. Table 8 presents the IV results for behavioral problems. In contrast to the OLS results in Table 6, the IV estimates in Column 1 of Table 8 show height to be a predictor of hyperactivity in girls. A one standard deviation increase in instrumented height increases the hyperactivity score by about 0.5 standard deviations. Similarly, height appears to be a positive predictor of boys' emotional problems, with the DWH test rejecting the exogeneity assumption of height. Although not statistically significant, the estimated effect is only slightly smaller for girls' emotional problems. Finally, columns 3 and 4 show that height increases conduct problems and decreases peer problems for girls, whilst the opposite is found for boys. With large standard errors however, we cannot statistically reject the null of no effect.

Instrument specification checks
We investigate the robustness of these results by using several instrument specification checks. First, we re-run the IV analyses using the weighted allele score as the instrumental variable, rather than the simple count of the number of risk alleles. The first as well as second stage results (available from the authors upon request) are very similar to those shown above, suggesting that the imposed structure on the instrument plays less of a role in this application. In fact, if we regress child height on each of the individual SNPs simultaneously, we cannot reject the null that the coefficients are equal to one another.
Second, we specify the nine SNPs as nine instrumental variables, rather than a count of the number of 'tall' alleles. As shown in Tables 9 and 10, this leads to a much weaker first stage, reducing the F-statistic to between 2 and 4. The point estimates remain similar to those reported above, though they are somewhat closer to zero. One difference is the estimate for girls' self-esteem. This was negative when using the allele count, but positive when using each SNP separately as an instrument. As we show below, this is probably due to the general imprecision with which these are estimated. The main results, however, are unchanged for both the unadjusted and adjusted regressions: height increases KS3 and IQ for girls (Table 9), and leads to an increase in behavioral problems (Table 10). In addition, the use of nine instruments allows us to test for over-identification using the Hansen J test, which we cannot reject in any of the specifications, providing suggestive evidence that the instruments are uncorrelated with the error term. 13 As a general test of gene-environment interactions, we explore whether our genetic variants are only expressed in specific environments, and therefore whether there is any direct evidence of violation of the monotonicity assumption. We estimate the first stage regression, interacting the genetic variants with indicators for various subgroups and test whether the instrument coefficient is the same across groups. The results (available from the authors) show no more significant differences than what would be expected by chance, providing suggestive evidence that gene-environment interactions do not play an important role for the genetic variants used here.
Finally, as discussed in Section 3.2.2, it is possible to examine genetic confounding through pleiotropy (i.e. variants influencing multiple pathways) or LD (i.e. variants being co-inherited) in more detail, using multiple combinations of genetic variants in different IV specifications. We investigate this here by estimating multiple IV models in which -each time -the instrument is defined by a different set of SNPs. We run a different IV regression for all possible sets of instrumental variables, leading to a total of 511 regressions for each outcome. 14 Obtaining similar estimates with different instrument sets would provide evidence against genetic confounding and increase the confidence in the validity of our findings.
Figs. 2 and 3 plot the point estimates from the IV regressions with different instrument sets, where the horizontal axis represents the IV estimate. 15 This shows a clear positive effect of height on KS3 and IQ for girls (the dashed line), with a negative or null effect for boys (the solid line). The sometimes long flat tails of the densities reflect estimates with a first stage F-statistic between 1 and 2, for which the estimates are more volatile. Excluding these weaker estimates removes the flat tail. The effect of height on scholastic competence and depression symptoms is generally zero for boys, with girls showing a slightly more positive effect on depression symptoms. In general, the estimates for girls are slightly more variable, which is likely due to their smaller first stage F-statistic, which also explains the different findings for self-esteem in Tables 7 and 9. Examining child behavior, the estimates show a clear increase for boys in emotional and peer problems, and a decrease in conduct problems, with no obvious effects on hyperactivity. The effects for girls are slightly more variable, but suggest height increases hyperactivity, emotional and conduct problems, but decreases peer problems.  Table 10 IV-The effects of contemporaneous height, instrumented by the nine SNPs simultaneously, on behavior at age 13.
(1) Hyperactivity (2) 14 We generate all possible subsets of k SNPs from the total of n (nine) elements, where k¼1,y,9. For example, when using sets of five of the nine SNPs, there are 126 unique combinations (ignoring the ordering of the SNPs): (n(n À 1)(n À 2)y(n À kþ 1))/(k(kÀ 1)(k À 2)y1); or (9 n 8 n 7 n 6 n 5)/(5!). We repeat this for all k-combinations, leading to 511 possible instrument sets. 15 We exclude estimates with a first stage F-statistic less than 1. The plots for the two self-esteem measures (scholastic competence and global selfworth) look similar; we plot only the former (the graph for self-worth is available on request).
For comparison, Figs. 4 and 5 present the point estimates from IV regressions with different instrument sets for the two falsification checks discussed above and shown in Table 4. The 'negative control' clearly shows a spike at zero for both boys and girls, confirming the absence of any effects of height on maternal education. The 'positive control' also confirms what we find above: height in both boys and girls increases their weight. Overall, these analyses do not provide evidence against the validity of the IV assumptions.

Non-linearities
As discussed in Section 2, the existing literature has found both tallness and shortness to have negative psychological effects in children. The estimates discussed above only examine differences in the outcome of interest at the mean, but the    relationship between height and the outcomes may differ at different points in the distribution. We therefore investigate different cut-points and examine the effects of being below the 25th and above the 75th percentile of the ageand gender specific height distribution. The results (available upon request) confirm our main findings. IV estimates show that shorter girls have lower IQ and do worse in school tests, and vice versa for taller girls, but there is no evidence of a relationship between height and scholastic competence, self-worth or depression. The IV effects of being tall or short on the child's behavioral problems also show similar patterns to those above: relatively tall girls are more hyperactive, and have fewer emotional and peer problems, though with the large standard errors, the latter is not significant at conventional levels. For boys, height increases emotional and peer problems, and decreases conduct problems.

Discussion and conclusion
This paper is the first to exploit genetic variation in height to examine the causal effects of height on human capital accumulation. OLS results show that taller children perform better in terms of cognitive performance and are less likely to have emotional and peer problems (girls), though tall girls are more likely to show symptoms of depression. Using genetic variation in height in an IV specification, we attempt to deal with the problems of endogeneity. The IV findings for girls are similar to the OLS for cognitive performance, showing a positive effect of height on KS3 and IQ. However, we do not find this for boys, where the results are indistinguishable from zero. We also find no effects of height on self-esteem and depression symptoms. In addition, we find a negative relationship of height with behavior. This suggests that the OLS results are downwardly biased and that height increases rather than decreases these behavioral problems. Taller children  are more hyperactive and are more likely to have emotional problems. In addition, taller boys are more likely to have peer problems, though there is a negative relationship for girls.
This suggests that height is endogenous to cognitive performance and behavior, though perhaps less so to self-esteem and depression symptoms, for which the OLS and IV estimates do not differ substantively. We are unsure why height would be endogenous to some, but not other outcomes. This may simply be due to the large standard errors, precluding us from making more precise inferences. Alternatively, it may be that unobserved factors such as pre-and postnatal nutrition affect cognitive functioning and behavior, but not self-esteem or depression. We cannot distinguish between such potential explanations.
In many of our results, the IV estimates suggest that OLS is biased downwards. One possible explanation for the difference between IV and OLS could be a genetic one. For example, (one of) our SNPs could be pleiotropic or in LD with another variant that directly affects IQ or cognition. Although our tests of associations with known confounders, our falsification checks, the 'multiple IV test', and the scientific literature do not give any reason to expect this to be the case, we cannot rule this out. For instance, it may be that our sample is too small to detect any association between the SNPs and the covariates, and it may be that any pleiotropic effects have simply not yet been identified, or that we do not observe the relevant confounders. From the evidence discussed in Section 3.2 and from the fact that we use only nine SNPs out of possibly hundreds or thousands SNPs coding for height, we assume that our assumptions hold. However, as in any other IV study, we cannot directly test this, and it remains an assumption.
A possible explanation for our IV findings that indicate that being taller increases rather than decreases behavioral problems could be the differential treatment of children of different stature. A 'size-appropriate' rather than 'age-appropriate' treatment of tall children may trigger behavioral problems. Expectations and reactions to 'tall-for-age' children's (what may seem childish) behavior can in turn affect children's development. As factors such as socio-economic position are positively related to height and negatively related to behavioral problems, the OLS estimates will be downward biased if these factors are insufficiently controlled for. Though possible, these are speculations as we currently have no further evidence to confirm these. However, the finding of increased behavioral problems is consistent with the psychological literature that has shown a positive relationship between height and children's behavioral problems, though this literature has mainly examined outcomes such as aggression and violence (Raine et al., 1998;Farrington, 1989) rather than those we examine here.
Finally, the IV effects for behavior and IQ are large: a one standard deviation increase in height raises these scores by about 0.2-0.7 standard deviations. Comparing these effects with those of other child characteristics shows they are substantial. For example, a 0.4 standard deviation difference in girls' IQ (Table 9) is comparable to the difference in this score for girls born approximately 6 months apart within the same school year. Likewise, the difference between girls' and boys' raw hyperactivity scores is approximately 0.37 standard deviations which is similar to the estimated effect of one standard deviation increase in height on hyperactivity for girls.
In conclusion, our findings suggest that height is an important factor in children's human capital accumulation in both childhood and adolescence, most likely as a result of the social reactions that are triggered by variations in height. We show that being tall may not only confer advantage but also disadvantage. Our examination of behavioral problems contrasts with the more positive view of height that emerges from the existing empirical literature on height and children's cognitive performance.

Funding
The UK Medical Research Council (MRC), the Wellcome Trust and the University of Bristol provide core support for ALSPAC. G.D.S. and D.A.L. work in a centre that receives funding from the UK MRC (G0600705) and University of Bristol. Funding from four grants supporting the specific work presented here is gratefully acknowledged: two from the UK Economic and Social Research Council (RES-060-23-0011 and PTA-026-27-2335) and two from the UK MRC (G0601625 and G1002345). No funding body influenced data collection, analysis or its interpretation. This publication is the work of the authors, who will serve as guarantors for the contents of this paper. nucleotides)-adenine (A), cytosine (C), guanine (G) and thymine (T). DNA consists of two strands of bases, which -for ease of understanding -have been likened to a zip in the way they fasten together. These two strands are held together such that A on one strand can only be linked to T on the other. Similarly, G can only be linked to C. DNA is stored in structures called chromosomes, where each chromosome contains a single continuous piece of DNA (made up of the two strands). A gene is a section of the chromosome that consists of a segment of DNA, i.e. a set of base pairs in a particular order.
All cells in the human body apart from germ cells contain 46 chromosomes, organized into 23 pairs: one copy of chromosome 1-22 from each parent, plus an X-chromosome from the mother and either an X or Y chromosome from the father.
About 99% of the DNA of any two unrelated individuals is identical. Locations where DNA varies between people are called polymorphisms. The most commonly studied form of polymorphism is a Single-Nucleotide Polymorphism (SNP): a change in just one base (nucleotide; one of the four molecules that form the codes of DNA) on the DNA sequence. As chromosomes come in pairs (one from each parent), humans have two such bases at each position, called alleles. These alleles can either be the same or different. The term genotype describes the specific set of alleles inherited at a particular location. For example, individuals can have one of three genotypes of the SNP in HMGA2 that we use as one of our instruments. They can be: 1. Homozygous for the common allele (having two of the same common (most prevalent/typical) alleles. For HMGA2, this is denoted by TT) 2. Heterozygous (having one common and one rare allele: CT) 3. Homozygous for the rare allele (having two of the same rare alleles: CC)