Introduction

Adult human body height or stature, due to its high heritability (>0.8) (Carmichael and McGue 1995; Macgregor et al. 2006; Phillips and Matheny 1990; Silventoinen et al. 2000), quantitative nature, easy accessibility, and accurate measurability, serves as a reference trait for studying the genetics of human complex traits (Aulchenko et al. 2009; Campbell et al. 2005; Hirschhorn et al. 2001). Until early 2008, however, only a few genes were known to be involved with normal height variation in general human populations (Perola et al. 2007; Silventoinen et al. 2003; Visscher et al. 2006). In 2008, three genome-wide association studies (GWAS) comprising tens of thousands of individuals highlighted 54 genetic loci displaying statistically significant association with body height (Gudbjartsson et al. 2008; Lettre et al. 2008; Weedon et al. 2008). Forty of these loci (74.1 %) had not been previously implicated in human body height, and many were from outside the biological pathways known or expected to regulate human growth. These studies on one hand convincingly confirmed the polygenic nature of adult height and on the other hand revealed several new genetic pathways involved in growth regulation, such as Hedgehog signaling and basic cell cycle regulation. In 2010, a meta-analysis of all available GWAS data from 46 studies comprising 180,000 individuals raised the number of known genetic loci associated with human height to at least 180 (Lango Allen et al. 2010). This work was done under the auspices of the Genetic Investigation of ANthropocentric Traits (GIANT) Consortium, referred to as the GIANT study below. In spite of the large number of genetic loci identified, the associated single nucleotide polymorphisms (SNPs) according to the authors only explained about 10 % of the sex- and age-adjusted height variance in the study populations (Lango Allen et al. 2010). The GIANT study pointed out that allelic heterogeneity is likely a frequent feature for human polygenic traits in general, and for adult body height in particular.

The ability to predict the extreme forms of adult body height based on the parental or the donor’s DNA has important practical implications in pediatric endocrinology (Unrath et al. 2012) as well as in forensic investigation (Kayser and de Knijff 2011), to name only two examples. Using 54 height-associated SNPs available in 2009, Aulchenko and colleagues (2009) constructed a genomic profile and showed that extreme forms of human body height were predictable with only limited accuracy. In the present study, we aim to update the predictive capacity for human tall stature available with the extended list of 180 height-associated SNPs from the GIANT study using a European study population from the Netherlands that was enriched by people of tall stature. We focus on tall stature here because it had been shown previously that the effects of height-associated common DNA variants are consistent with the predicted polygenic effects in extremely tall but less so in extremely short individuals (Chan et al. 2011).

Methods

The Dutch tall cohort (DT)

From the records of the Division of Pediatric Endocrinology at the Erasmus University Medical Center, Sophia Children’s Hospital, we identified former patients who attended this clinic for evaluation of tall stature. Eligible subjects were traced using municipal registries and invited by mail to participate in this study. We also identified several healthy tall individuals through advertisement in local specialized shops, sports centers, and institutions of higher education in Rotterdam. Subjects eligible for participation fulfilled the following inclusion criteria: (1) standard deviation score (SDS) above +1.88 SD according to Dutch standards (http://www.tno.nl/groei), which corresponds to the 3 % upper tail of the height distribution in Dutch adults [approximately >195 cm in men and >180 cm in women at age 30, after correcting for secular trend (Fredriks et al. 2000)]; and (2) Dutch European ancestry defined as being born to Dutch parents who themselves were born in the Netherlands. Three subjects with endocrine or metabolic disorders or primary or secondary growth disorders were excluded. Subjects fulfilling the inclusion criteria who had been treated with high-dose sex steroids to limit growth in adolescence were eligible to participate and were considered as being tall. Participants were invited to visit the outpatient clinic of the Erasmus Medical Center. Height was measured using a stadiometer (SECA 225; SECA, Hamburg, Germany). Genomic DNA was extracted from venous blood using standard methods. After phenotypic and genotypic quality controls, the DT cohort included 462 unrelated Dutch tall individuals. These tall subjects are independent of the GIANT study.

The Rotterdam Study (RS)

The RS is a population-based prospective study including a main cohort (RS-I) and two extensions (RS-II and RS-III) (Hofman et al. 1991, 2007, 2009). The height and genetic data regarding RS subjects have been described in the GIANT paper. In brief, all RS participants were examined in detail at baseline including a quantitative measurement of body height using a stadiometer. The Medical Ethics Committee of Erasmus University Medical Center approved the study protocol and all study participants provided written informed consent. After genomic and phenotypic quality controls, the current study included 5,748, 2,152, and 1,999 participants from RS-I, RS-II, and RS-III, respectively. The tall stature in RS was set as the sex- and age-adjusted residuals >1.88 standard deviations (193 cases in RS-I, 49 in RS-II, and 66 in RS-III).

Microarray genotyping and quality control (QC)

Details regarding genotyping, SNP imputation and QC of RS individuals can be found elsewhere. DT individuals were de novo genotyped using Illumina Human610Quad array (with 620,901 markers, both SNPs and CNV probes) according to standard procedures. The standard QC of the samples included the analysis of genotyped called sex, call rate, heterozygosity, homozygosity, assessment of cryptic family relationships and assessment of population admixture/stratification. Multidimensional scaling (MDS) analysis was conducted using identity-by-state (IBS) pair-wise distances with reference to individuals of the HapMap Phase II panel (http://hapmap.ncbi.nlm.nih.gov/). Potential population stratification was initially controlled by exclusion of individuals of non-Northwestern European origin defined as those deviating more than 4 standard deviations from the mean of the HapMap CEU panel. Genotypes with minor allele frequency (MAF) > 1 %, SNP call rate > 98 % and Hardy–Weinberg equilibrium (HWE) P > 1 × 10−4 were then imputed to 2,543,887 SNPs using MACH (Li et al. 2009) with reference to the phased autosomal chromosomes of the HapMap CEU Phase II panel (release 22, build 36). Pair-wise IBS matrix between individuals was recalculated using a subset of pruned SNPs (N = 50,000) that are in approximate linkage equilibrium. No close relatives were identified. We merged DT with RS using Genome Studio software v2010 and compared the SNP call rate by cohort status and excluded 13,330 SNPs with a significant difference (P < 0.01). This QC step is very important to reduce the potential false positives due to differential missingness between cohorts.

Association analysis at 180 height-associated regions

The tall stature was considered as a binary trait (tall or non-tall). Since tall stature was defined using sex- and age-adjusted residuals, sex and age were no longer considered in subsequent analyses. We selected the 180 SNPs described in the Supplementary Table 1 of the GIANT paper (Lango Allen et al. 2010). Association was tested using logistic regression where allelic Odds ratios (ORs) were derived. We consider P values <0.05 as statistically significant. Next, we extended the 180 GIANT SNPs to 180 regions defined as ±200 kBp from each side of the GIANT SNPs and conducted association analysis as well as conditional association analysis (conditional on the effect of the GIANT SNP in each region) searching for secondary signals, where the minimal P values were Bonferroni corrected for the number of SNPs in each region. Haplotypes were inferred based on genotypes using the expectation–maximization algorithm implemented in R library haplo.stats. We tested whether the observed ORs are consistent with the expected ones, based on the previously estimated effect sizes from the GIANT study using the method proposed by Chan et al. In brief, we calculated the expected OR and its expected 95 % confidence interval for each SNP by estimating the odds of the effect allele versus the non-effect allele in the tall cases and controls assuming a standard normal distribution for the standardized height based on the formula provided by Chan et al., except that we used 1.88 for integration (representing 3 % tallest) instead of using ±2.326 (representing 1 % two-sided tails).

Prediction analysis

There are several competing methods available for predicting tall stature from DNA variants. In this study, we selected the weighted allele sums (WAS) (Aulchenko et al. 2009) method to allow direct comparison with the previous study of Aulchenko, also because its performance was not significantly lower than the other methods tested in our dataset (see Supplementary materials and Supplementary Fig. 1a, b). The WAS is a weighted prediction of height for each individual based on either logistic regression betas from the training set of the current study (80 % of samples) or the previously estimated betas of linear regressions from the GIANT study. The model performance was cross-validated in the remaining 20 % samples (number of cross-validations = 1,000), where the prediction accuracies were derived. We derived the area under the receiver operating characteristic (ROC) curves, or AUC (Janssens et al. 2004), as the accuracy estimate. AUC is the integral of ROC curves, which ranges from 0.5 representing total lack of prediction to 1.0 representing completely accurate prediction.

Prediction analyses were conducted using four different lists of SNPs as predictors: (1) 180 SNPs randomly selected over the microarray, (2) 54 SNPs previously used by Aulchenko et al. (2009), (3) 180 GIANT SNPs (Lango Allen et al. 2010), (4) 180 GIANT plus two additional SNPs from current study showing significant secondary association with tall stature in a reanalysis of the 180 regions. All prediction analyses were conducted using R scripting and R libraries “e1071”, “glmnet”, “nnet”, “rpart” and “randomForest” accepting default tuning parameters, and the AUC values were derived using R library “verification”, all freely available from the CRAN website (http://cran.r-project.org/).

GWAS

We conducted a case–control designed GWAS for tall stature in 10,361 Dutch Europeans (see Supplementary materials, Supplementary Fig. 2).

Results

A binary tall stature was set as the sex- and age-adjusted height residuals >1.88 standard deviations (770 tall cases and 9,591 non-tall controls, Fig. 1). More than half (60 %) of the tall cases were from the Dutch tall cohort (N = 462), which was a completely independent cohort from the GIANT study population, whereas the remaining tall individuals as well as the normal height controls came from the Rotterdam study previously included in the GIANT meta-analysis (Lango Allen et al. 2010). Among the 180 height-associated SNPs reported in the GIANT study and tested here, 166 (92.2 %) had the same height-increasing alleles in our case–control samples as reported in the GIANT study; 75 (41.7 %) were nominally significantly associated with tall stature (P < 0.05); and none of the significant associations were in the opposite allele directions (Supplementary Table 1). The probability of observing 75 or more significant P values out of 180 tests is extremely small (P = 7.1 × 10−50) under the null binomial distribution. We tested whether the observed ORs are consistent with the expected ORs based on the previously estimated effect sizes from the GIANT study. Similar to that observed by Chan et al., the number of SNPs with observed OR greater than expected OR was no different from expectation under the model of equal effect sizes in tall cases and controls (97/180 SNPs, P = 0.11, Supplementary Table 1). A multivariate logistic regression including all 180 SNPs showed highly consistent results with the analysis of individual SNPs, where 70 (38.9 %) were nominally significant and all of these 70 were directionally consistent with the GIANT study (Supplementary Table 1). These results indicate that the 180 GIANT SNPs are highly informative for tall stature in our dataset, which motivated us to carry out a series of prediction analyses of tall stature utilizing these DNA variants.

Fig. 1
figure 1

The distribution of sex- and age-adjusted height residuals (z-score) in the four studied Dutch subpopulations. Dutch Tall cohort (DT) = 462, Rotterdam Study 1 cohort (RS-I) = 5,748; Rotterdam Study 2 cohort (RS-II) = 2,152; Rotterdam Study 3 cohort (RS-III) = 1,999; total N = 10,361. The vertical blue dashed line indicates the cut-off value (1.88 SD) for cases

A WAS prediction analysis using 180 SNPs repeatedly and randomly selected from the whole SNP microarray dataset gave an average AUC of 0.52, slightly higher than the null value of 0.50 (Table 1). This finding is not surprising given the highly polygenic nature of the trait, i.e., one may expect some of the randomly selected SNPs being associated with height simply by chance. Conducting the WAS prediction analysis using the 54 SNPs previously applied for tall stature prediction by Aulchenko et al. (2009) gave a mean cross-validated AUC of 0.67 (95 % CI 0.63–0.70, Table 1) in our study population for predicting the 3 % tallest, which was consistent with the one reported by Aulchenko et al. (AUC = 0.65 for predicting the 5 % tallest) in the RS-I population, which is partly (25 % of all cases and 58 % of all controls) overlapping with the samples used here. Repeating the prediction analysis using the 180 GIANT SNPs, however, resulted in a substantial improvement in prediction accuracy with a mean cross-validated AUC of 0.75 (95 % CI 0.72–0.79, Table 1) in our sample set. The WAS score calculated using the 180 GIANT SNPs also showed a significant increase from non-tall (mean −1.68) to tall individuals (mean −0.31, P t test < 1 × 10−300, Fig. 2). In terms of the percentage of variance explained (the sex- and age-adjusted residuals), the 180 GIANT SNPs (cross-validated mean R 2 = 12.14 %, 95 % CI 10.35–14.07) could explain more than twofold increased variance of sex- and age-adjusted body height than that explained by the 54 SNP previously used by Aulchenko et al. (cross-validated mean R 2 = 5.40 %, 95 % CI 3.75–6.97, Table 1).

Table 1 Accuracy for predicting tall stature using SNP genotypes in 10,361 Dutch Europeans, where 770 (7.43 %) tall people were defined as body height > +1.88 standard deviations of sex- and age-adjusted height residuals
Fig. 2
figure 2

Distribution of weighted allele sums (WAS) in 770 tall and 9,591 non-tall Dutch Europeans. a Histogram; b density plot. The WAS score was calculated for each individual using the 180 GIANT SNPs as regression beta × the number of minor alleles (not necessary the height-increasing allele)

The above prediction accuracies were estimated using the WAS score weighted by the logistic regression coefficients in our Dutch sample. Weighting the WAS score directly using the linear regression betas from the GIANT paper also resulted high comparable accuracy estimates (AUC = 0.74 and R 2 = 12.03 %), indicating that our prediction model is likely extendable to other European populations. Further restricting the prediction analysis only in RS sample also resulted comparable accuracy estimates (AUC = 0.75, R 2 = 12.16 %), indicating that the accuracy is unlikely over-estimated because the RS sample was part of the GIANT study. This is expected because the Rotterdam sample represents only a small fraction of the GIANT study, and also because the majority of the tall cases in the current study were ascertained independent of the Rotterdam study and were not used in the GIANT study. Several genes/loci were highlighted to contain SNPs contributing the prediction accuracy with ∆AUC > 0.01, such as ADAMTS10, PCSK5, LTBP2, EFEMP1, PTCH1/FANCC, CABLES1, NPR3 and ADAMTSL3, representing the best tall stature DNA predictors in our study. Yang et al. (2010) has proposed that reanalysis of the already discovered regions may reveal new variants further explaining phenotypic variance. In this study, the prediction accuracy could be slightly (AUC = 0.76, 95 % CI 0.73–0.80, Table 1) but significantly (P < 0.01, t test based on 1,000 cross-validations) improved by including two newly identified SNPs (rs12048049 in TGFB2 and rs10869665 in PCSK5) showing significant secondary association with tall stature (see below).

We reanalyzed all SNPs flanking the 180 GIANT SNPs (±200 kBp, total 36,442 SNPs), which did not reveal additional SNPs with genome-wide significance except two signals at LTBP2 and ADAMTS10 (Fig. 3a). The observed P values at these 180 regions far deviated from the expected ones under the null hypothesis (Fig. 3b). This can be largely explained by linkage disequilibrium (LD) around known signals, supporting the involvement of these regions in tall stature. A conditional analysis (conditioned on the GIANT SNPs in each region), however, revealed two different SNPs in two different regions being highly significantly associated with tall stature: rs12048049 in TGFB2 (P = 1.8 × 10−13) and rs10869665 in PCSK5 (P = 7.8 × 10−11, Fig. 3c). For both regions, individual SNPs were associated with tall stature at only nominal significance (0.001 < P < 0.05). We, therefore, further conducted a diplotype analysis for the associated SNPs and identified several large ORs by contrasting different diplotype groups. For example, the AC/GC diplotype group of the two SNPs rs6684205 (with alleles AG, from GIANT) and rs12048049 (CG, the secondary signal) at TGFB2 had a significantly increased fraction of tall stature compared to the reference AC/AC group (OR = 5.20, 95 % CI 3.24–8.33, P = 8.01 × 10−12, Table 2). Likewise, the GC/AT group for rs11144688GA and rs10869665CT at the PCSK5 locus had a significantly decreased fraction of tall stature than the reference GC/GC group (OR = 0.38, 95 % CI 0.27–0.55, P = 1.92 × 10−7, Table 2). These secondary association signals were new to the GIANT findings, which may be explained by different study designs. The conditional analysis in the GIANT study included 225 SNPs as covariates and was conducted in each stage-I cohort then meta-analyzed, whereas our analysis was iteratively conditioned on each GIANT SNP using individual-level data. The observed P values from the conditional analysis also substantially deviated far from the null (Fig. 3d). The remaining LD at these loci alone cannot fully explain this massive deviation including the highly significant secondary signals at TGFB2 and PCSK5, which strongly suggest that allelic heterogeneity plays an important role not only for normal height variation but also for tall stature in humans.

Fig. 3
figure 3

Regional Manhattan plot for SNPs associated with tall stature at the 180 GIANT loci (total 36,442 SNPs) in 770 tall and 9,591 non-tall Dutch Europeans. a Initial results; b QQ plot of the initial results; c conditioned on the GIANT SNPs in each locus; d QQ plot of the conditional analysis

Table 2 Diplotypes at TGFB2 and PCSK5 associated with tall stature in 10,361 Dutch Europeans (N tall = 770)

Finally, we compared our data with two recent candidate gene studies on tall stature. The growth hormone (GH)/insulin-like growth factor-1 (IGF-1) axis is a key regulator of somatic growth in humans. We observed significant genetic association with tall stature for IGF2BP2 (rs720390, P = 4.1 × 10−2), IGF2BP3 (rs12534093, P = 1.1 × 10−3), IGF1R (rs2871865, P = 4.7 × 10−4) and GH1 (rs2665838, P = 1.5 × 10−2) but not for GHSR (rs572169, P = 0.37, Supplementary Table 1). These findings are highly consistent with a recent candidate gene study focusing on the GH axis in relation to extreme tall stature (Hendriks et al. 2011a). Recently, a candidate gene study of HMGA2 reported an effect of rs1042725 in extremely tall stature (OR for C allele = 1.53, 95 % CI 1.02–2.28; P = 0.03) (Hendriks et al. 2011b). In our study of an increased sample size, this SNP also showed significant but smaller effect on tall stature (OR for C allele = 1.15, 95 % CI 1.03–1.27; P = 0.01), whereas another HMGA2 SNP rs1351394 reported by the GIANT study showed a slightly larger effect (OR for T allele = 1.17, 95 % CI 1.05–1.29; P = 3.45 × 10−3, Supplementary Table 1).

Discussion

With the present study we demonstrated that a large fraction of the 180 SNPs previously associated with normal height in Europeans is also associated with tall stature in Europeans. Notably, the 180 SNPs were previously identified in samples from 46 studies, comprising 183,727 individuals with recent European ancestry from various regions of Europe, whereas our study was carried out in a European population from a single country (i.e., the Netherlands). The vast majority of these 180 previously normal height-associated SNPs in our study had odds ratios for tall stature that were directionally consistent with those for normal height in the GIANT study. These results confirm the findings of Chan et al. (2011) in that height-associated common DNA variants show predicted polygenic effects in tall individuals, i.e., the observed ORs are consistent with the expected ORs from the previously estimated effect sizes from the GIANT study. However, this does not contradict the hypothesis that rare variants may still explain a substantial portion of the trait heritability.

Our data demonstrate that common SNPs allow the genetic prediction of tall stature with an accuracy that considerably exceeds a previous attempt. Aulchenko and colleagues (2009) previously established a genomic profile based on 54 height-associated SNPs available in 2009, which showed a relatively low predictive accuracy (AUC = 0.65 for predicting a person falling into 5 % tallest of their study and AUC = 0.67 for predicting the tallest 3 % in the current study). These authors also estimated that in order to achieve an AUC of 0.80, one needs to explain at least three times the amount of phenotypic variance that is explained by the 54 SNPs they used. On the other hand, they showed that the traditional approach of predicting height using mid-parental height, the so-called Victorian Galton’s method, achieved an AUC of 0.84 when predicting the tallest 5 %. The authors suspected that the Victorian Galton’s method will long stay unsurpassed in terms of predictive accuracy. Unfortunately, this approach is useless in those practical applications of height prediction where parental height is usually not available, such as in forensic investigations. Here, we empirically demonstrate that the capacity of the extended list of 180 SNPs in predicting tall stature summed up to a reasonably high accuracy with AUC of 0.75, which far surpassed that of Aulchenko et al. (AUC of 0.65 or 0.67) and gets closer to albeit not yet surpassing the non-genetic Galton’s method. However, the genomic prediction accuracy for tall stature obtained here is likely to be further improvable by the outcomes of future work on the genetic basis of human heights. Considering the increasingly smaller effect sizes of the remaining common variants to be discovered by GWAS, it is rather unlikely that the genomic prediction accuracy can be improved substantially by undiscovered common variants at novel loci even if extremely large studies are being conducted (ongoing extension of GIANT study). A more promising approach to further improve the genomic predictability of tall stature may be by considering genetic interactions and compound alleles of all variants within the already discovered loci as will become feasible with next generation exome or whole genome sequencing data.

The ability to accurately predict the extreme forms of adult human body height using DNA variants is of practical relevance in several applications. In pediatric endocrinology for instance, the early knowledge about abnormal statures is important for making clinical decisions on growth advancing and height-limiting treatments (Hendriks et al. 2010, 2011c, 2012). Current practices in predicting the expected adult body height of a child relies on bone age (Thodberg et al. 2009) and mid-parental height (Su et al. 2011). Although the accuracy of the Galton’s mid-parental height method was estimated as high as AUC of 0.84, its applicability may be hampered by assortative mating and poor parent–offspring correlation. These methods may at least be improvable by including the child’s or parental DNA variants as additional predictors.

Another application of DNA-based prediction of tall human stature is in Forensic DNA Phenotyping (FDP), a young field of forensic genetics aiming to provide investigative leads in suspect-less criminal cases via providing externally visible characteristics of the sample donor directly from DNA obtained from the crime scene (Kayser and de Knijff 2011; Kayser and Schneider 2009). Successful examples of FDP include DNA prediction of certain eye and hair color categories, which can be predicted with a much higher accuracy from a much smaller number of SNPs (Branicki et al. 2011; Liu et al. 2009a). For these traits, DNA test system have already been developed and validated for practical forensic case work applications (Walsh et al. 2011a, b, 2012a, b) and are started to be used in routine forensic practice.

Our GWAS in a Dutch European sample set enriched for tall individuals highlighted two loci being significantly associated with tall stature (LTBP2 and ADAMTS10), both overlapped with those loci previously reported with genome-wide significant association with normal stature in the GIANT study. Although the effective sample size (N ~ 25,600 from power calculation) is increased by the extreme design, this number is still far from comparable to the GIANT study, which used 134,000 individuals in the initial discover phase. Therefore, this sample is expected to have low power in finding new height loci unless specifically involved in extreme tall stature or much more involved in extremely tall than in normal stature. Our results suggest that such loci unlikely exist.

When analyzing the genomic regions that include the 180 GIANT SNPs, we found SNPs at two loci (LTBP2 and ADAMTS10) showing association with tall stature at the genome-wide significance level. Mutations in LTBP2 and ADAMTS10 are long known to cause autosomal recessive Weill-Marchesani syndrome (Dagoneau et al. 2004; Haji-Seyed-Javadi et al. 2012). We also found SNPs at two genes (TGFB2 and PCSK5) showing highly significant secondary association signals. This strongly suggests that allelic heterogeneity likely exists at least in these associated regions. Including these two newly identified SNPs in prediction model improved the prediction accuracy slightly but significantly, underling the importance of regional fine analysis of the already discovered loci/regions for finding new variants, as also proposed by Yang et al. (2010). TGFB2 encodes the transforming growth factor beta (TGF-β), a protein that controls proliferation, cellular differentiation, and other functions in most cells. The role of TGF-β signaling in growth and related disorders has been systematically reviewed (Le Goff and Cormier-Daire 2012). TGF-β is also known to prevent the impaired chondrocyte proliferation induced by unloading in growth plates of young rats (Zerath et al. 1997). PCSK5 encodes the proprotein convertase subtilisin/kexin type 5, the expression of which has been linked to developmental dynamics in mice (Liu et al. 2009b; Szumska et al. 2008) and zebra fish (Chitramuthu et al. 2010). Several loci did not show statistically significant association likely due to low MAF (<5 %, and thus low power) but showed consistent allele directions, such as PPARD/FANCE, CYP19A1 and ACAN, in which the possibility of existing rare functional mutations cannot be excluded. For example, exome sequencing recently identified amino acid-altering variants in ACAN influencing human height (Kim et al. 2012). In addition, our data supports the recent significant findings for the GH-related genes (Hendriks et al. 2011a) and HMGA2 (Hendriks et al. 2011b) on extreme tall stature, although their genetic effects appeared to be smaller in our sample. Overall, these data show that the genetics of tall stature is similar to that of normal height in humans, i.e., heritability is for a large part explained by many common variants with small effects and allelic heterogeneity is a frequent feature.

In summary, we showed that a large number of DNA variants previously implicated in normal height variation in Europeans are also involved in determining tall stature of Europeans. The obtained genomic prediction accuracy is considerably improved compared to a previous attempt. There is, however, still room for further improvement, which may partly rely on modeling the genetic interactions and allelic heterogeneities within those height-associated loci/regions. Nevertheless, the achieved accuracy is already relevant to fields of future practical applications of genomic height prediction such as pediatrics and forensics in that they clearly demonstrate the potential of DNA for predicting height, at least it is extremely tall form. Finally, extrapolating from our results suggests that the genomic prediction of at least the extreme forms of common complex traits in humans including common diseases are likely to be informative if large numbers of trait-associated common DNA variants are available.