Simulating the evolution of height in the Netherlands in recent history

ABSTRACT The Dutch have a remarkable history when it comes to height. From being one of the shortest European populations in the 19th Century, the Dutch grew some 20 cm and are currently the tallest population in the world. Wealth, hygiene, and diet are well-established contributors to this major increase in height. Some have suggested that natural selection may also contribute to the trend, but evidence is weak. Here, we investigate the potential role of natural selection in the increase in height through simulations. We first ask what if natural selection was solely responsible for the observed increase in height? If the increase in average height was fully due to natural selection on male height, then across six consecutive generations, men who were two standard deviation above average height would need to have eight times more children on average. If selection acted only through those who have the opportunity to reproduce, then reproduction would need to be restricted to the tallest third (37%) of the population in order to give rise to the stark increase in height over time. No linear relationship between height and child mortality is able to account for the increase over time. We then present simulations based on previously observed estimates of partnership, mortality, selection and heritability and show that natural selection had a negligible effect (estimates from 0.07 to 0.36 cm) on the increase in height in the period 1850 to 2000. Our simulations highlight the plasticity of height and how remarkable the trend in height is in evolutionary terms. Only by using a combination of methods and insights from different disciplines, including biology, demography, and history are we potentially able to address how much of the increase in height is due to natural selection versus other causes.


Introduction
Spare a thought for the Dutch. Beds are just that bit too short, so their feet must dangle over the end, beyond the reach of the bedclothes. It is all too easy to walk straight into a low hanging door lintel and should a visit to the hospital be needed as a result, many are now too tall to fit easily into a standard ambulance. These are, quite literally, First World problems: the Dutch population is, on average, the tallest in the industrialised West, and far outstrips populations in the Global South. Such a state of affairs came about surprisingly recently, however. Prior to the mid-19 th century, the Dutch were one of the shortest populations in Europe, reaching only 165 cm on average (for men; Figure 1). There then followed 200 years in which height increased by an impressive 20 cm, rising to an average of around 184 cm for men and 171 cm for women (Schönbeck et al., 2013). This has not been a steady increase in height over time, as during the early part of the 19 th century, average height actually decreased for a period (in response to a subsistence crisis, caused by the rising price of dairy products, successive crop failures, and the effects of the Crimean war on grain imports, plus increased incidences of the infectious diseases smallpox, typhoid and cholera; Drukker & Tassenaar, 1997). Dutch adolescents only started to become exceptionally tall after 1857 (Drukker & Tassenaar, 1997), and it was the birth cohorts of the 1870s that began the secular trend in height that has persisted through to the present day (Tassenaar, 2019). Low levels of social inequality (in terms of both income and healthcare) and the impact of a high-quality diet rich in calcium, have all been identified as contributing to this increase in height (de Beer, 2012;Jones & Bird, 2014;van Staveren et al., 1985).
While it is clear that height differences within and between populations are largely driven by environmental factors (Floud et al., 2011;Steckel, 2009), there is also evidence to suggest that genetic differences can account for at least some inter-population variation (Grasgruber et al., 2014(Grasgruber et al., , 2016Hruschka et al., 2019;Robinson et al., 2015;Turchin et al., 2012; see also Stulp and Barrett (2016) for review). This is most notable in pygmy populations (i.e. those where the adult male height is 150 cm or less)-largely because this is where most research  Baten and Blum (2012), and Hatton and Bray (2010). Because estimates of height are often based on conscription records which were men only, there are fewer estimates available for female height particularly in earlier times. Risk & Collaboration 2016) also present a curve for women which is nearly identical to the one from men above, except shifted vertically by ~13 cm. See also endnote 1 . effort has been focused. Both Becker et al. (2011) and Jarvis et al. (2012) compared the Baka people of Cameroon to their taller non-pygmy neighbours, and found that higher levels of genetic admixture between the Baka and their neighbours were associated with an increase in the height of Baka individuals. Other evidence for genetic differences in height across populations comes from Cho et al. (2009) who showed that one of the genes shown to have a large effect on height in Europeans (HMGA2) was also associated with stature in a Korean population, although the height-raising effect was much smaller and the frequency of the allele associated with increased height was lower. This suggests that genetic differences may partly account for the shorter average heights of Koreans compared to Europeans. Differences in average height between northern and southern Europeans have similarly been attributed to genetic differences between these populations (Robinson et al., 2015;Turchin et al., 2012). Moreover, these genetic differences were likely to have come about through natural selection (although these findings have been disputed: Berg et al. (2019); Sohail et al. (2019)) The idea that natural selection has produced height differences across countries is consistent with the findings that selection acts on height in contemporary populations (Byars et al., 2010;Sanjak et al., 2018;Stulp & Barrett, 2016;Stulp et al., 2015). Furthermore, differences in the direction of selection may account for differences across populations. In the US, for example, selection seems to favour shorter heights across both men and women (Byars et al., 2010;, whereas it favours taller heights in the Netherlands . Taken together, these between-population differences suggest that temporal shifts in average height could also reflect the action of natural selection. Caution is warranted here, however, as most evidence on selection in humans is based only on contemporary phenotypic associations between height and number of surviving offspring (see Byars et al. (2010); Sanjak et al. (2018) for notable exceptions that involve pedigree analyses). There are many reasons why these associations cannot simply be used to quantify the effect of natural selection on trends in heights (some of which will be explained later), perhaps the most obvious one being that evidence of natural selection acting in the present day does not warrant the inference that selection was also acting, or acting in the same way, during past centuries. Moreover, it is also apparent that the response to selection of observed selection gradients could only minimally account for the increase in height. Indeed, Tarka et al. (2015) criticised Stulp et al. (2015) for what they viewed as an inappropriate emphasis on the likely impact of natural selection on height and, using the breeder's equation (Lush, 1937), showed that any such effect was likely to be minimal, namely 0.28 mm in 150 years.
Such a criticism is entirely valid, and puts the findings of Stulp et al. (2015) in perspective. However, calculating the response to selection involves making a range of assumptions about how height may have varied throughout history that are -and perhaps never can be -verified. Accurately determining the likely genetic response to selection requires, at a minimum, a magnitude of selection and a measure of genetic heritability, neither of which is easy to establish for historical populations, where the relevant data are scarce, if not absent altogether. Even in contemporary populations, where heritabilities for physical traits, such as height, are well-established through diverse methods (McEvoy & Visscher, 2009;Visscher et al., 2008), estimates of selection are much harder to come by. Moreover, it is the genetic and not the phenotypic correlation between the trait and fitness that is necessary in order to calculate the genetic response to selection (Sanjak et al., 2018;Tropf et al., 2015). For example, a phenotypic correlation can exist between height and fitness, because parents can provide an environment that is conducive to both growth (resulting in taller height) and reproduction (resulting in higher fitness) that can be completely independent of the heights of the parents or the genetic variants related to height in the offspring. In this case, the phenotypic association between height and fitness would suggest natural selection; however, no genetic response would occur (i.e., no average increase in height in the next generation) because the genetic variants that are associated with increased height across individuals have nothing to do with the increased fitness of taller individuals (see also Morrissey et al., 2010)). Even sophisticated pedigree (Byars et al., 2010) and genetic analyses (Sanjak et al., 2018;Tropf et al., 2015) that permit the calculation of a genetic response nevertheless make strong assumptions concerning fitness. For instance, that it can be proxied by number of children ever born and that the timing of births and population dynamics can be ignored.
In other words, we face a momentous task if we wish to determine empirically the extent to which secular trends in height reflect the influence of natural selection relative to other causes. Simulation studies, however, can help in this regard (Smaldino, 2017). Here, we can vary the value of relevant parameters (e.g. strength of natural selection, magnitude of heritability), allow natural selection to run its course on artificial populations, and examine its effects on future generations. By comparing model inputs and outcomes to observed parameters across history (e.g. marriage rates, % childless, child mortality), we can place boundaries on what natural selection would be able to achieve and how this may have varied across time. Here, we use simulation models to investigate the history of the Dutch height increase. Before explaining the model and the relevant parameters, it is useful to offer a brief sketch of what was happening in the Netherlands during the period we wish to consider.

Environmental variation and stature in the Netherlands
Even in a small country like the Netherlands, there were substantial differences in nutritional intake in the 19 th century, which varied over time with modernization (Tassenaar, 2019). Before mechanical refrigeration, the transport of certain foods, such as dairy products, fruit and vegetables, could not be achieved without a loss of food quality, such that larger cities and urbanized areas experienced lower food availability than rural areas. In addition, there were differences in the extent and scale of transport networks within the country. Coastal areas to the west and north had a well-connected canal system, but there was no such transport network among the more inland provinces. Food was therefore exported less, or not at all, from inland regions. In the early parts of 19 th century, then, more rural, inland regions would have experienced greater food availability than the more urban and market-oriented coastal provinces (Tassenaar, 2019). Although data are sparse in the rural areas at this time, the average height of conscripts in Amsterdam was far below the level of the southern provinces, which supports the idea that pre-modernization rural areas experienced a higher standard of living compared to urban centres (Komlos, 1998). De Beer (2010) shows similar regional patterns for both men and women, based on a database of prison detainees.
A period of modernization between 1840 and 1870 served to change this regional pattern (Zanden and van Riel (2000) as cited in Tassenaar (2019)). There were improvements in agricultural and industrial production, which increased the overall availability of food, and new preservation techniques increased food quality and enabled it to be transported over larger distances. There were also changes to transport infrastructures, such as trains, steam-shops, paved roads, and additions to the canal system (Tassenaar, 2019). In line with this, rural provinces began to fall behind with respect to biological living standards compared to the coastal provinces in the second half of the 19 th century, as evidenced by conscripts from coastal areas becoming taller than their inland counterparts, and a reduction in the 'urban penalty' on height (Tassenaar, 2019). Indeed, with improvements to improved sanitation and medical care, as well as better and cheaper foods (Komlos, 1998;Steckel, 2009), the urban penalty became an urban premium, and city dwellers gained a sustained height advantage, at least in middle-sized and smaller cities (de Beer, 2010; Tassenaar, 2019).

Height differences and social status
Measures of height in relation to family of origin are rare in the historical record, making it all but impossible to get an accurate picture of the impact of social status on growth across the Netherlands as a whole. Beekink and Kok (2017), however, provide an intriguing glimpse of the likely variation that existed in their study of heights recorded in the small town of Woerden in the first half of the 19 th century (1815-1867). Their unique dataset, based on registration records for military service and the civil guard, includes records of male height during both late adolescence (19 years) and adulthood (25 years), and also provides information on family of origin. Families could be divided into three classes of increasing status: unskilled brick and tile workers, farmers and the middle class/elite. At age 19, sons of unskilled workers were on average 162 cm tall (SD = 8.3 cm), approximately 8 cm shorter than the sons of farmers (170 cm, SD = 8.8 cm), and 6 cm shorter than the sons of the wealthy elite (168 cm, SD = 7.3 cm). By age 25, 'catch-up' growth had reduced these differences slightly: the sons of unskilled workers showed an average of 5.1 cm of catch-up growth to reach an adult height of 167 cm, compared to 175 cm for farmers and 171 cm for the elite (i.e. 3.3 cm of catch-up growth). Nevertheless, it is apparent that lower social status (which included heavy labour during childhood) had a lasting effect on height among men in this town.

Marriage, fertility, and child mortality
Data on the likelihood and age of marriage, child survival, and fertility rates are useful for any evolutionary analysis, and can help inform our models and the interpretation placed on them. With respect to marriage, John Hajnal (1965) famously bisected Europe along a line running from St Petersburg to Trieste: areas to the west of line were characterised by relatively late age at marriage, relatively low levels of fertility, and a relatively high proportion of permanent celibacy compared to areas east of the line where early age at marriage accompanied by high fertility was the norm. In the former, restrictions on marriage helped control population growth, where in the latter high levels of mortality countered population increases. The Netherlands of the 19 th century fits the Western European Marriage Pattern (WEMP) as Hajnal described. Average age at first marriage for women ranged between 25.2 to 25.7 years and was 28.1-28.2 years for men between 1800-1899 (Störmer et al., 2017). The highest marriage age for women was found in the south of the country, which increased from an average of 24.5 to 27.1 years across this period, whereas the average age at marriage in other regions remained below 26 years. Among men, the highest age of marriage was found in the east of the country at the beginning of the 19 th century (29.2 years), but they were overtaken by the southern men over the course of the next 100 years, with age at marriage increasing from an average of 28.8 to 29.6 years, while marriage age in the east fell to 28.7 years. Men and women in the east nevertheless consistently married later than those in the west and the north, where the average age at marriage ranged between 26 and 27 years. Age at marriage was consistently higher for men in rural areas than in urban areas (28.5 years versus 27.5 years, respectively, by the end of the century), whereas for women this pattern was reversed, although the differences were smaller than those for males (25.7 years versus 26 years, respectively, by 1899) (Störmer et al., 2017). There were also considerable levels of permanent celibacy: between 1830 and 1930, Engelen and Kok (2003) report that 15.5% of the women and 14.5-15.5% of the men had never married by the age of 40-44. Again, there were regional differences with provinces in the South (Limburg and N. Brabant) experiencing a higher than average proportion of never-married individuals. Urban areas also had a lower proportion of never-married individuals than rural areas, although this varied by size: 16.4% of the population were never-married in towns with a population of 5000 or less, whereas the value in the largest cities was only 9.45% in 1899.
A study examining Dutch men in the second half of the nineteenth century showed that height was associated with getting married but not with age at marriage (Thompson et al., 2021). The evidence points towards a marriage penalty for being short, rather than any benefits to tall height: the shortest 20% of the men had a ~ 50% lower likelihood of marriage compared to average height men, but there were no further advantages to taller heights (the tallest 20% even had the second lowest likelihood of marriage). The authors speculate that the absence of an effect of height on marital age exists because height is also associated with factors that delayed marriage in those time periods, such as wealth (Engelen & Kok, 2003;Thompson et al., 2019). A follow-up study on a sample of Dutch men born between 1850 and 1900 also showed that taller male height was not associated with having more children and that, if anything, below-average height men were most likely to have large families (Thompson et al., 2022).
With respect to fertility rates, Wintle (2000) gives values of 33.1 births per 1000 average annual population between 1840-1849 rising to a peak of 36.2 per 1000 in the decade between 1870-79. Births then show a decline to 30.9 per 1000 at the turn of the century, reaching a value of 22.0 by 1950-55. Hendrickx (1997), focusing on the Twente region and presenting data on the communities of Borne and Wierden, gives values for the completed family size of 4.06 and 4.34, respectively, during the period 1831-1840, and 4.25 and 4.40, respectively, between 1871 and 1880. With respect to mortality Wintle (2000) reports that deaths under 1 year rose from 18.2 per 100 births in 1840-49 to a peak of 20.7 per 100 in 1865-1874, and then underwent a steady decline to 2.3 births per 100 in 1950-55. These figures represent the ongoing effects of the demographic transition and a move from a high fertility, high mortality regime to a low fertility, low mortality regime.

Simulating evolution
Through simulations in R (R Core Team, 2018) we model the evolution of Dutch heights from 1850 to 2000. We first give a quick overview of the model before explaining each step in more detail (see also Figure 2). Each model run consists of the following. First, an initial population is generated with 1000 men (or agents) living in the year 1850. The only trait these agents have is height, which is randomly generated from a distribution with an average height of 165.3 cm as observed in 1850 1 and a standard deviation of 7 2 ( Figure 1). Second, agents find a partner (or not) and the partner also has a height. Mating is either random, based on mating preferences or based on assortative mating. Third, the couple gives birth to a certain number of children. 3 Male height contributes to how many children are produced, depending on estimates of selection either observed in the literature or depending on counterfactual (what-if) questions. Not all children survive, and in counterfactual situations mortality is made dependent on male height. Fourth, the height of each child is determined by the height of both parents. This new generation of children and their heights serves as the next generation and there is no partnering with members from earlier generations. After the parental generation in 1850, 6 novel generations are formed with a generation time of 25, ending in 2000. In essence, this means that everyone that reproduces does so at age 25, and dies right after. The starting population is 1000 agents unless otherwise specified, and we ran each model 1000 times. 4 R code to produce all simulations and supplementary materials can be found at: https://doi.org/10. 34894/UHUWP9.

Who finds a partner?
Agents find a partner (or not) by way of two types of mating patterns. First, there is random mating, where all agents pair with a partner whose height is randomly determined. Second, we vary the height threshold deciding who gets to pair with a partner and who does not (e.g. 10% shortest do not find a partner, only 20% tallest find partner). Steps that an agent (male) goes through in the model. The only trait an agent has is height. Each simulation starts with 1000 agents. The offspring produced forms the next generation of agents. Each test is replicated 1000 times. Parameters vary across scenarios (see text).

Who will have children and how many?
Depending on whether a partnership is formed, an agent will reproduce. The number of children is dependent on the height of the father and the association between male height and relative fitness. We examine how large the selection differential needs to be to fully account for the increase in height. We also implement magnitudes of selection differentials that are observed in the literature, as well as magnitudes of selection on animal traits as observed in natural populations. In another set of simulations, we examine the interplay between population child mortality levels and the magnitude of the association between father's height and child mortality to determine which combinations of these factors would be needed to account for the increase in height. A major limitation of our simulations is that we do not include the effect of female height on fitness outcomes.

How tall will offspring be?
The average height of parents, or midparental height, determines the height of the offspring depending on the assumed heritability. Heritability is a population-specific estimate of how much variation in height in a population can be explained by genetic differences between people. In contemporary populations, this is 80% for height (McEvoy & Visscher, 2009;Visscher et al., 2008). Note that this measure is as much a measure of the explanatory value of genetic differences as one of homogeneity in environmental effects on height (which would lead to less variation in height due to environmental differences).
An offspring's height for an agent is determined by multiplying the midparental height by the heritability, 5 plus the addition of random error drawn from a normal distribution with a mean of zero and a standard deviation equal to the square root of one minus heritability. This random error can be seen as a combination of environmental variation and variation due to non-additive genetic effects (Falconer & Mackay, 1996). Applying this technique to many midparental heights, and subsequently regressing offspring heights (dependent variable) on midparental heights (independent variable) would lead to a linear regression estimate that is equal to the heritability (Falconer & Mackay, 1996;Visscher et al., 2008).
An important assumption is that in each generation of offspring, the heights of the offspring are scaled such that the standard deviation in heights is 7 cm. This means that in each generation, there must be mechanisms at play that increase genetic variation to similar levels to that of the preceding generation. This assumption may be fine when selection on height is minimal but is problematic in the case of strong selection because strong selection on a heritable trait is expected to reduce genetic variation in that trait. Reduced genetic variation in turn limits the response to selection. Thus, by artificially maintaining levels of variation in height similar in each new generation, we are also inflating the response to selection.

Predicting the response to natural selection
Here, some terminology from the evolutionary literature is useful. The predicted average change in a trait after one generation in a population produced by natural selection on that trait, R, is often captured via the breeder's equation (Lush, 1937): in which h 2 refers to the narrow-sense heritability, and S refers to the selection differential. The narrow-sense heritability, sometimes referred to as the heritability due to additive genetic effects, expresses to what extent children's phenotypes (in our case height) are determined by the genes transmitted from the parents, and it determines the degree of resemblance between relatives (Falconer & Mackay, 1996, p. 126). The selection differential, S, refers to the covariance between the trait of interest and relative fitness. 6 The breeder's equation, as the name suggests, comes from breeding experiments in which environmental differences between individuals could be tightly controlled. Nature is messier, and non-random mating, overlapping generations, changing environments, gene-environment interactions, and gene-gene interactions all complicate the use of the breeder's equation in natural populations (Heywood, 2005;Morrissey et al., 2010). The equation thus holds only under a particular set of circumstances. Luckily, we can guarantee these in our simulations.
An important component of predicting the response to selection is the selection differential S, which measures the strength of the association between the trait and fitness. More specifically, it represents the covariance between the trait and relative fitness (with a mean of 1; or the number of children divided by the average number of children), which is reported in many human and non-human studies. Parameter estimates of a Poisson regression, often reported in human studies, provide a good approximation to the selection differential (Tarka et al., 2015). In a multivariate framework, when control factors are added to regression models, S still is the regression coefficient on relative fitness, but now it is referred to as the selection gradient (Lande & Arnold, 1983).
Given the tried and tested usefulness of the breeder's equation in predicting evolutionary change under controlled conditions (not so much under natural conditions), why perform simulations? There are two main reasons. First, running multiple simulations provides insight into the variation in outcomes that can result from observed selection differentials. Second, adding different mating patterns and adding differential effects of selection through partnering versus mortality cannot be so straightforwardly calculated via the aforementioned equation. A major benefit is that we can verify the correctness of our simulations with the breeder's equation in the initialisation and development of the simulation model and subsequently interrogate the conditions under which the breeder's equation breaks down.

The various scenarios that we simulate
Our main aim is to get a reasonable estimate of the extent to which natural selection is responsible for the increase in height in the last 150 years. In doing so, we first ask: what if the upward trend in height in recent history was entirely due to natural selection?. We present three scenarios: A) how many more children should taller men have relative to shorter men to account for the trend; B) what level of child mortality and how strongly should mortality be affected by paternal height for it to explain the trend in height; C) if a fraction of the men in the height distribution were not allowed to reproduce, how large would this fraction need to be to account for the upward trend. These analyses are informative because they provide a background to a situation in which more realistic scenarios are modelled, and they highlight the unique trend in height in evolutionary terms.
We then present scenarios in which we simulate to what extent observed selection differentials in the literature can account for the upward trend in height (scenario D). We conclude by presenting a simulation that is informed by demographic outcomes across time (e.g. average fertility, mortality), estimates of heritability across time, and estimates of the effects of height on fitness outcomes across time (scenario E). This will be our best estimate of the magnitude of natural selection in explaining the upward trend in height.

Scenario A: what if it was all natural selection?
What if the increase in height of 17.2 cm (Figure 1) between 1850 and 2000, a per generation increase of 2.87 cm, was all due to natural selection? For this to happen, we would need a covariance between standardized height and relative fitness to be around 1.02. In Figure 3a, we show the association between male height and the number of children that needs to hold in each generation when average fertility equals 4 and the selection differential is 1.02. Men of average height would on average have 4 children. Men who are one standard deviation above average height would need to have on average 10 children, and those two standard deviations above average height would need to produce 30 children.

Scenario B: how severe must child mortality be to account for the increase in height?
Another way that natural selection could favour taller heights is if parental height is associated with child survival. We varied population child survival rates ( Figure 4) and examined what the magnitude of strength between paternal height and child survival needed to be to achieve a height of 182.5 cm after six generations. Child survival needs to be as low as 10% (i.e. child mortality 90%), and each standard deviation increase in parental height should be associated with 700% higher odds of a child surviving to get at 182.5 cm. Figure 3b visually displays the association between male height and child survival of this particular magnitude. Even with the enormous and implausible magnitude between paternal height and child survival, selection was very unlikely to or could not have reached 182.5 cm with higher survival percentages than 10% (Figure 4).

Scenario C: truncation selection: determining a threshold height for partnership
What if mate selection was so drastic that only those men who reach a certain height threshold could reproduce? Specifically, what threshold would be required in order to produce the observed upward shift in height over time? 7 This is akin to artificial breeding (or truncation selection) where only those animals whose trait value exceeds a given threshold were permitted to reproduce (Crow & Kimura, 1979;Falconer & Mackay, 1996;Matsumura et al., 2012). In this case, to achieve the increase in height over 150 years, then, across each of six generations, reproduction would need to be limited to the 37% tallest individuals in the population, with the remaining 63% of the shorter individuals not reproducing at all, if natural (truncated) selection alone was responsible for the drastic increase in height in the Netherlands (see footnote 7 and the supplementary materials for calculations). To get a sense of the different levels of thresholds and variation across simulations, truncated selection was simulated where  The y-axis shows the population level of child survival. To achieve the observed 182.5 cm in six generations, child survival needs to be as low as 10%. The association between male height and child survival has an odds-ratio of 7 for all distributions. No odds-ratio is sufficient to achieve the 182.5 cm when population level survival is 20% or higher. All simulations started at an average male height of 165.3 cm.
only the tallest 90%, 80%, . . . , 10% of the individuals could reproduce (through partnering) ( Figure 5). 8 When only the 10% tallest individuals reproduce, a shift of ~30 cm can be achieved in six generations. When only the 10% shortest individuals cannot reproduce, a shift of ~ 3.5 cm is observed. It is important to note that these calculations are only true under conditions that would be violated 'in nature', particularly the fact that the major shifts in the trait distribution due to natural selection does not change the fitness landscape of the trait and that there are some mechanisms that increase genetic variation in the population. 9

Scenario D: the evolution of height with observed selection gradients
How much of an increase in height do we see over time when we use estimates from contemporary populations on heritability and selection? For heritability, we assume a value of 0.8 (Silventoinen et al., 2012;Visscher et al., 2008). To get at the effect of different selection gradients, we modelled three situations ( Figure 6): one in which there is an absence of selection (i.e. no association between male height and the number of offspring); one in which we use the selection gradient as reported in the Stulp et al. (2015) paper (Poisson regression estimates of 0.022 for the linear and −0.011 for the quadratic term of standardized height 10 ), and one in which we use a linear selection gradient of 0.16 which was the median value of ~1000 estimates of selection in natural (animal) populations (Kingsolver et al., 2001).
On average, across 1000 simulations of 1000 agents based on the selection differential observed in Stulp et al. (2015), we find a median increase in height of 0.36 cm (in line with Tarka and colleagues). There is, however, considerable variation across replications: because of the low magnitude of the selection differential, in 21% of the simulations a decrease in the average height was found after six generations, whereas in 5% of the simulations there was an increase of 1.05 cm or higher. Modelling the median selection gradient of 0.16, results in a heftier response, with a median increase of 2.69 cm (Figure 6).

Scenario E: modelling the response to natural selection under most realistic conditions
In our final series of simulations, we try to incorporate as much empirical information as possible across the different generations. Thus, for each generation and based on various sources, we determine the fraction of people that have a partner, the association between height and having a partner, the fertility rate, the association between height and fertility, the association between height and child survival and the heritability of height (see Table 1). In these models, we also assume weak, but positive assortative mating for height (r = 0.2; Stulp et al. (2017)). 11 Many of our estimates come from the work of Thompson et al. (2021Thompson et al. ( , 2022, who showed that height was associated with marriage, fertility, and survival in a historical Dutch population. In this sample, there was not a particular advantage for taller men (average height men seemed to do 'better'), although shorter men were less likely to be married.  Figure 6. Changes in height after 6 generations of reproduction from a parental generation of 1000 individuals and 1000 replications. Heritability is 80% for the first three panels. In the left panel, there is no relationship between male height and number of offspring, in the second panel the relationship is similar to that observed in Stulp et al. (2015), in the third panel there is a linear selection gradient of 0.16 based on Kingsolver et al. (2001). and the fourth panel is based on table 1. Each grey line is a simulation run, the blue line is the predicted response from the breeder's equation (Lush, 1937).  Across 1000 simulations, we find a median increase in height of 0.07 cm (Figure 6; farright panel). In about 42% of the simulations, a decrease in average height was found. 12

Natural selection is an unlikely cause of the historical upward trend in height
To conclude with the obvious: natural selection alone cannot explain the trend in height in the Netherlands. The strength of selection needed to account entirely for the increase in height is enormous. Of course, no one would argue that the increase in height is solely due to natural selection, but the scenarios offer unequivocal evidence of how remarkable the trend in stature has been and how plastic the human body is.
If natural selection had been the only determinant of the increased trend, it would require that, in each generation, with each standard deviation increase in male height, the average number of children produced would triple. With an average fertility of four children, this means that men who were one, two, and three standard deviations above height would have, on average, about 10, 30, and 100 children, respectively.
Examining child survival rather than the production of children leads to a similar conclusion: only at very high levels of child mortality (>90%) and strong associations between father's height and child survival can the increase in height in the Netherlands in six generations be achieved. Given that the actual historical rates of child mortality were closer to 20% (around the 1850s, which decreased to 2% 100 years later; Wintle (2000)), no linear association between parental height and child survival could account for the drastic increase in height over time.
A third way of looking at this issue leads to the same conclusion: if only a select group of individuals could reproduce depending on their height, where would the threshold need to lie in order to produce the observed increase in height? The answer here is that only the 37% tallest individuals could reproduce, and the majority of the population would not have been able to reproduce at all. Even if we entertain the possibility that such a threshold could exist, the historical rates of childless men are nowhere near 63%. A more reasonable suggestion is that between 10-20% of the men have not produced offspring based on marriage rates (Wintle, 2000) and records of permanent celibacy (Engelen & Kok, 2003). Six generations of selection during which the 10% shortest men did not reproduce lead to an average increase in height of only 3.5 cm, which is very far from the observed 17.2 cm. However, the idea of a threshold of this magnitude is also at odds with reality: A study by Thompson et al. (2021) showed that, for men in the second half of the nineteenth century, the 20% shortest men were about half as likely to get married than average height men, but that there was no further advantage for taller heights. In this sample, the percentage of nonmarried men was about 20%. These empirical estimates clearly do not fit the pattern that would be needed to produce the increase in height over time when selection was the sole consequence of finding a partner.
What proportion of the secular trend in height in the Netherlands can be explained by natural selection, then? Probably near zero. In line with the conclusions of Tarka and colleagues, we show that estimates of the strength of phenotypic selection in a sample of Dutch men born between 1933 and 1969 , can only account for a 0.36 cm increase in height over six generations at best. In terms of the total increase of 17.2 cm (182.5-165.3; Figure 1, but see endnote 1), 0.36 cm (~2%) is obviously minimal. Tarka and colleagues were thus absolutely right in arguing that, based on the results of Stulp et al. (2015), selection has a minimal role to play in the secular trend in height in the Netherlands. This point is reinforced when, in addition to selection effects as reported in Stulp et al. (2015), we include recently reported associations between height, marriage, and fertility in a Dutch historical population (Thompson et al., 2021(Thompson et al., , 2022 in our simulations. The increase in height over six generations, then drops to a negligible 0.07 cm. This 0.36 cm or even 0.07 cm is probably even an upper bound estimate because of several assumptions in (or shortcomings of) the model. First, we do not include selection effects on women, which are likely to be either stabilizing (average height women have more children) or will favour shorter women (Byars et al., 2010;Sanjak et al., 2018;Stulp et al., 2015;Stulp, Kuijper, et al., 2012;. This reduces or even negates the positive response to selection in men. Second, our generation time of 25 years may have been on the young side: while our average age at marriage (by and large a pre-condition for having children throughout this historical period) may have been accurate for women in the 19 th century, men married some three years later. Age at marriage further increased moving into the 20 th century.
A third reason why we should consider the increase of 0.36/0.07 cm as an upper bound is that we did not take age at first birth into account. Reproducing earlier can have a higher pay-off in terms of fitness than reproducing more (Jones & Bird, 2014). Given that taller heights are consistently associated with delayed reproduction, this would also change the response to selection in favour of shorter average heights.

Additional limitations to our models
In our models, we assumed a relatively static environment and included no environmental effects on the stature itself, which can generate misleading expectations of phenotypic microevolution (Wilson et al., 2006). One particularly relevant environmental effect would be the resource dilution that accompanies increased family size and leads to shorter sibling heights (e.g. Lawson & Mace, 2008;Quanjer & Kok, 2019). If taller families have more children, then these families also have to compensate for the fact that parental resources are spread out over multiple children, which would decrease (some of) the heights of the children. If these shorter children who come from taller families then have less success in finding a partner compared to taller children from smaller and shorter families (i.e. where more resources per capita were available), then this would limit the response to selection. Modelling this would involve tracking both genetic and environmental effects on individuals' height, and how these distinct aspects of height factor into, among other things, patterns of partner choice and mortality.
To keep our model comparable to the breeder's equation we made some further assumptions that we know to be violated. First, all agents reproduced at age 25, and all had children at the same point in time after which their job was done. The subsequent generation would then do the same. Thus, there were no differences in the timing of births, which, as already noted, can have major consequences for fitness (Jones & Bird, 2014). Moreover, population dynamics and/or size played no role in our models, because generations were not overlapping, remarriage was not possible, and the potential pool of available partners was essentially infinite. Again, these model assumptions and simplifications are aspects that could be manipulated further, and explored in subsequent models.
One final and important caveat is that we did not include selection on women (or rather, we assumed it to be non-existent). This is problematic because selection on female height may well be stronger than on male height and may run in a different direction (Sanjak et al., 2018;Stearns et al., 2012;Stulp, Kuijper, et al., 2012;. Again, such sex-specific selection places constraints on the genetic response to selection. More generally, the effects of female height on mate choice (Stulp et al., 2013), mortality (Özaltin et al., 2010), and fecundity  need to be accounted for and explored in more detail, in addition to changing social structures that affect these relationships.

The strength of selection in human populations
If height was under selection according to the most frequently observed magnitude of selection on traits in animal populations (a selection gradient of 0.18; Kingsolver et al. (2001)), then natural selection could have resulted in an increase of 2.66 cm in just six generations which is certainly substantial. Whether this magnitude of selection is also plausible for humans, particularly in more recent times, is doubtful: Studies on selection in humans rarely show such a strength of a selection (although there is a possibility that this reflects our ability to acquire more precise measures of selection gradients in humans than we can for other species) (e.g. Sanjak et al. (2018)). If human selection gradients are genuinely lower, however, this raises questions of why this should be, and whether this has been true throughout our evolutionary history or is a more recent phenomenon (tied to patterns of cultural change). In other words, it may be the case that higher selection gradients operated when the genetic architecture of height was being shaped, and current evolutionary forces now differ. Simulation studies like ours are therefore useful at beginning to explore how and why evolutionary forces vary over time and the consequences of such variation. Nonetheless, even a genetic response to selection of 2.66 cm is relatively minor compared to the increase in height over time of around 17 cm, again highlighting how unique the upward trend in height is in evolutionary terms.

Simulating changes in height: going from patterns to processes
The use of our simulation models illustrates exactly how extreme conditions would need to be for natural selection to be responsible for the observed height increase in the Netherlands and highlights (the obvious conclusion) that environmental and sociocultural factors need to be included in our analyses. Adding these factors to traditional statistical models of inheritance is not feasible, but our simulations represent a step toward being able to incorporate both genetic and environmental effects into our analyses. Such future developments are crucial, given that many factors thought to be involved in the evolution of height, some of which are related to individual variation in resource access, may be mediated through non-random access to potential marriage partners and the existence of particular kinds of cultural norms. Simulations offer the possibility of investigating the manner in which such factors interact and, importantly, how they feed back on each other. That is, via simulation, we can generate and identify the causes of different evolutionary dynamics and illustrate how selection gradients shift given particular environmental backgrounds. This, then, is where a simulation approach can really pay dividends as there are ample historical data available that allow for both historically-based inputs and comparisons of outputs. Thus, future simulations could incorporate information on historically documented aspects of biology, demography, and patterns of cultural change. This allows us to go beyond documenting historical patterns, and begin to identify the processes likely to underpin them. For example, estimates of resource dilution as described above (i.e. larger families having shorter offspring due to lower levels of resources invested in each child) could be incorporated into simulations in order to assess the potential extent to which resource availability, as mediated by family size, influences height, which, in turn, might be linked to cultural shifts in child-bearing norms and age at marriage at the population level. That is, simulations can be used to investigate how spatial and temporal demographic patterns intersect with individual reproductive decision-making. As part of this, simulations are ideal for investigating counterfactual historical sequences, and determining the difference made by particular kinds of events (or their absence) (e.g. one could investigate how an earlier onset of modernisation influenced both evolutionary and secular trends in height in the Netherlands). There is ample opportunity to make use of these datasets in a way that can help us better understand patterns of evolution within human populations, and how the nature and strength of the evolutionary forces operating might change over time.

Conclusion
Our models based on historical data suggest that natural selection did not play a role in the evolution of height in the Netherlands (following Tarka et al. (2015)). Indeed, these models can be seen as an extension of Tarka and colleagues' calculations, with the added benefit that we were able to model alternative processes (e.g. mate choice, child survival, and differential fertility). Combining historical and contemporary estimates of, for example, celibacy, childlessness, child mortality, and knowledge of the effects of height on reproductive outcomes allowed us to quantify which processes are most likely to exert an effect, and which produced the largest effects on average heights across time. Our explorations of this score served to reinforce Tarka and colleagues' conclusion that even robust, 'highly significant' associations between fitness-related traits can be negligible when viewed on a longer-time scale and when other sources of variation also exist. Nevertheless, our simulation models allowed us to carefully scrutinise the magnitude of observed associations. Finally, simulation studies offer the potential to explore in greater depth how contemporary evolutionary forces may differ from those that acted to shape the genetic architecture of certain traits at an earlier point in our evolutionary history.

Notes
1. This is an underestimate of the average height of adults, because the conscripts whose heights were measures were around 18-19 years old, whereas growth continued up to age 25 (Beekink & Kok, 2017). Thus, the average heights of adult men in 1850 is~4-5 cm taller. The increase in height over time with these corrections is close to 16/17 cm (Tassenaar, 2020), which is not far off from our 17.2 cm. 2. Estimates of standard deviations are harder to come by than estimates of averages.
Available estimates of the standard deviation for male height tend to center around 7 cm. For instance, in the study by 2017), for~25 year old civic guards born between (2015), examining height of men born from 1933 to 1969, the range of standard deviation across birth years was 5.98 cm to 7.29 cm (for women: 5.02 cm to 6.58 cm). In a more recent study, a standard deviation of 7.1 cm was observed for 21-year-old men in 2009 (Schönbeck et al., 2013; with an average of 183.8 cm; for women: mean = 170.7 and SD = 6.3 cm). We assume a standard deviation of 7 cm for men. This assumption is clearly flawed because the standard deviation itself responds to environmental inequality. 3. For the model the average number of children does not matter, given that the important relation is the covariance between height and relative fitness and because generations are completely separated from one another (all agents reproduce at age 25 and only partner with agents from the same generation). Thus, the increase in height in the next generation would be similar if taller men had 6 children compared to 3 children of average height men as when taller men had 4 children compared to 2 children. 4. For reasons of computing efficiency, we restrict the population of offspring to be of similar size as the parental generation, by randomly drawing the desired number of agents from the pool of offspring. For example, if each of the 1000 agents has exactly 3 children and we store information on all parents and 6 generation of descendants, we would need to store a dataset of 1000 × 3 6 = 729,000 agents. If we restrict future generations to be of size 1000, this is reduced to 7000. 5. For the ease of calculating midparental height and offspring heights, we assumed that the variance in height of mothers and fathers were identical (Falconer & Mackay, 1996, p. 168). This does not matter a great deal given that our models effectively are about male agents producing sons. The correlation between midparental height and offspring heights are identical to the heritability when pairing is random. 6. The selection differential S is conceptualised differently in the literature and it is simultaneously defined as the difference in average height of the parental and the offspring generation, the covariance between the trait and relative fitness and the average height of parents weighted by their number of children (Falconer & Mackay, 1996;Heywood, 2005;Lande & Arnold, 1983;Morrissey et al., 2010;Price, 1970). They all come down to the same thing. 7. This percentage can be determined by rearranging the breeder's equation. In the case of truncated selection, the selection differential divided by the standard deviation in height gives the intensity of selection. This intensity of selection, i, relates to the proportion of individuals that are selected as parents for reproduction by making use of the standard normal distribution and the formula z divided by p (in which p is the proportion of individuals selected, and z is the height of the normal distribution that corresponds to p) (Falconer & Mackay, 1996, p. 192). This means that the genetic response to selection can be expressed in the following way: R ¼ h 2 S ¼ ih 2 σ p . Given that we only select men who will be randomly paired to a partner, we need to multiply this equation by half (Falconer & Mackay, 1996, p. 194). Applying it to our case, we assume that the (per-generation) genetic response to selection, R, equals 2.87 cm. Solving the equation with a heritability h 2 of 0.8 and standard deviation σ p of 7 cm, gives an intensity of selection i equal to 1.025. This corresponds to 37% of individuals being selected for parenting (because a p equal to 0.37 corresponds to a Z-score of 0.33, and the height of the normal distribution at Z equal to 0.33 results in a z of 0.38; 0.38/ 0.37 ≈ 1.025; see also supplementary materials). 8. We varied the starting population size and the average number of children born, to guarantee that minimally 1000 offspring were born that could be selected as parents for the next generation. For instance, when only 10% of individuals could reproduce, the initial population size was 10,000, of which 1000 reproduced. Each of these 1000 individuals had 10 children, so that in the next generation again 10% of 10,000 were selected. 9. When we don't scale the standard deviation in the offspring generation to be similar to that of the parental generation, the genetic response to selection is much smaller because heritable variation in height is reduced in each generation because only the tallest individuals get to reproduce.
10. Tarka and colleagues used a linear regression estimate of 0.014 which was an estimate of height (selection gradient) that was controlled for many other variables (e.g. education, income, health). The precise meaning of this estimate is that one standard deviation increase in height leads to an increase in the natural log in the number of children of 0.014, or differently said, leads to an increase in the number of children by 1.4% (e 0.014 = 1.014). Adding those variables reduced sample size by 20% and the remaining sample probably suffered from sample bias because of the many individuals not responding on their income. Tarka et al calculated on the basis of the breeder's equation that the per generation response to selection was 0.38 mm, leading to a negligible increase of 2.28 mm after six generations. 11. Assortative mating increases the heritability in the subsequent generation, but the increases are modest and at most about ten percent (Falconer & Mackay, 1996, p. 177). We did not take into account this increase in heritability, which means that the increase in height due to a genetic response is underestimated in this simulation. 12. The reason why the blue line from the breeder's equation in the right panel does not map neatly onto the outcomes of the simulations, is because the breeder's equation as we have calculated it 1) does not take into account assortative mating, and 2) is based on the selection gradients between height and number of offspring born in men that have a partner, and thus ignores both unpartnered men and child survival. For the other panels in the figure, this is taken into account.