A Variant in LIN28B Is Associated with 2D:4D Finger-Length Ratio, a Putative Retrospective Biomarker of Prenatal Testosterone Exposure

The ratio of the lengths of an individual’s second to fourth digit (2D:4D) is commonly used as a noninvasive retrospective biomarker for prenatal androgen exposure. In order to identify the genetic determinants of 2D:4D, we applied a genome-wide association approach to 1507 11-year-old children from the Avon Longitudinal Study of Parents and Children (ALSPAC) in whom 2D:4D ratio had been measured, as well as a sample of 1382 12- to 16-year-olds from the Brisbane Adolescent Twin Study. A meta-analysis of the two scans identiﬁed a single variant in the LIN28B gene that was strongly associated with 2D:4D (rs314277: p ¼ 4.1 3 10 (cid:2) 8 ) and was subsequently independently replicated in an additional 3659 children from the ALSPAC cohort (p ¼ 1.53 3 10 (cid:2) 6 ). The minor allele of the rs314277 variant has previously been linked to increased height and delayed age at menarche, but in our study it was associated with increased 2D:4D in the direction opposite to that of previous reports on the correlation between 2D:4D and age at menarche. Our ﬁndings call into question the validity of 2D:4D as a simplistic retrospective biomarker for prenatal testosterone exposure.


Introduction
The ratio of the lengths of the second to fourth digits (2D:4D) is a sexually dimorphic trait that is on average a quarter of a standard deviation lower in males than in females. 1 First identified by Ecker in 1875; 2 the measure was rediscovered by Wilson in the early 1980s 3 and subsequently by Manning, who hypothesized that the ratio reflected prenatal androgen exposure. 4,5 Consistent with this theory, sex differences in 2D:4D develop prenatally 6 and remain relatively stable across the lifespan. 7 Given the practical and ethical difficulties inherent in measuring testosterone exposure in the developing fetus, many researchers have adopted 2D:4D as a noninvasive retrospective biomarker for prenatal androgen exposure, although its use as such is controversial (see McIntyre et al. for a review 8 ). Despite this controversy, over 300 papers using this measure have been published in the last 10 years, 9 and 2D:4D has been shown to correlate with a wide range of diseases and physiological and psychological traits, including autism, 10 attention deficit disorder, 11 fertility, 12 myocardial infarction, 13 visuo-spatial ability, 14 homosexuality, 15,16 athletic performance, 17 and age at menarche. 18 2D:4D is highly heritable, with additive genetic effects explaining~60% of the phenotypic variance. [19][20][21][22] Although the results from one twin study suggested that female twins exhibit higher heritabilities than males for left hand 2D:4D, 19 a larger study failed to find any significant sex limitation or differences in the magnitude of heritability between the sexes for 2D:4D traits. 20 There has also been little progress identifying the individual variants underlying this genetic variation. Following the hypothesis that 2D:4D reflects prenatal exposure to testosterone, Manning et al. 23 examined the association between the number of CAG repeats at the androgen receptor locus (AR [MIM 313700]) and 2D:4D in males (n ¼ 51) and reported a significant correlation between CAGn and 2D:4D for the right (r ¼ 0.29) but not the left (r ¼ 0.005) hand. To the best of our knowledge, no other genetic association studies using this trait have been performed to date.
In order to identify genetic determinants underlying variation in 2D:4D, we applied a genome-wide association approach to study 1507 children from the Avon Longitudinal Study of Parents and Children (ALSPAC), a large population-based cohort in which 2D:4D had been measured at~11 years of age, 24 as well as a sample of 1382 12-, 14-, and 16-year-old twins and their singleton siblings from the Queensland Institute of Medical Research, Brisbane Adolescent Twin Study (QIMR, Australia). 20,25

Subjects and Methods
Participants ALSPAC is a population-based birth cohort study consisting of over 13,000 women and their children recruited from the county of Avon, UK, in the early 1990s. 24 Both mothers and children have been extensively followed from the eighth gestational week onward with a combination of self-reported questionnaires, medical records, and physical examinations. Biological samples including DNA have been collected for 10,121 of the children from this cohort. The discovery sample reported in this study concerns 1507 children who had their 2D:4D measured at 11 years of age (mean ¼ 11.75 years) and for whom genome-wide SNP typing had been performed. 26 The replication sample consisted of an additional 3659 children from ALSPAC who were not part of the initial discovery cohort.
Participants in the QIMR Brisbane Adolescent Twin Study were recruited from the general population, in the context of ongoing studies of melanoma risk factors and studies of cognition. 27 Twins and their singleton siblings were enlisted by contacting the principals of primary schools in the greater Brisbane area, by media appeals, and by word of mouth. It is estimated that approximately 50% of the eligible birth cohort were recruited into the study, which began in 1992. Digit ratios were available for 1382 individuals with genome-wide association data from 671 families (comprising 169 singletons, 332 sibling pairs, 135 sibling trios, 31 sibling quads, and 4 sibling quins). Age range for the sample was 11-24 years (mean ¼ 15.46, standard deviation [SD] ¼ 3.27).
In both samples, participants' hands were photocopied during a clinical visit, and measurements of the second and fourth fingers were taken from the photocopies with the use of digital calipers (accurate to 0.1 mm). The 2D:4D was calculated as the length of the second digit divided by the length of the fourth digit, multiplied by 100 so as to avoid computational difficulties due to the low variance of the trait. In both samples, the measure was normally distributed, so no further transformation was required.
In ALSPAC, a random sample of 57 right and 48 left hands were measured in vivo to establish the validity of using a photocopy measurement to assess 2D:4D. Similarly, in the QIMR cohort, 680 hands were measured twice from the same hand photocopy, once by hand with the use of digitial calipers and once with the use of a computerassisted measurement program, and the reliability between measurement occasions was calculated.
Children's standing height in ALSPAC was measured with a Harpenden Stadiometer. Both studies were performed with the approval of the appropriate ethics committees, and informed consent was obtained from all participants and their parents.

Genotyping
A total of 1543 ALSPAC children were initially genotyped at 317,504 SNPs on the Illumina HumanHap 317K SNP chip. Individuals exhibiting cryptic relatedness, non-European ancestry, high genome-wide heterozygosity, and/or missing rates were removed from analyses as described previously, 26 leaving 1507 individuals in the analysis who had been measured for 2D:4D. Markers with minor allele frequency (MAF) < 1%, SNPs with > 5% missing genotypes, and any marker that failed an exact test of Hardy-Weinberg equilibrium (HWE; p < 5 3 10 À7 ) were excluded from further analysis, leaving 310,613 SNPs that passed quality control.
The QIMR participants analyzed here were genotyped on the Illumina Human610-Quad SNP chip. These samples were genotyped in the context of a larger GWAS, which resulted in the genotyping of 16,140 individuals 27 with the use of the Illumina 317, 370, and 610 SNP chips. Genotype data were screened for genotyping quality (GenCall < 0.7), SNP and individual call rates (< 0.95), HWE failure (p < 10 À6 ), and MAF (< 0.01). Because these samples were genotyped in the context of a larger project, the data were integrated with the larger QIMR genotype project and checked for pedigree, sex and Mendelian errors and for non-European ancestry. Because the QIMR genotyping project included data from the 317, 370, and 610 chip sets, to avoid introducing bias to the imputed data, a set of SNPs common to the three genotyping platforms was used for imputation (n ¼ 274,604).
Follow-up genotyping of two SNPs in the LIN28B gene (MIM 611044), rs314277 and rs314276, was carried out in an additional 5129 individuals from the ALSPAC cohort by K-Biosciences, who employ a novel form of competitive allele-specific PCR (KASPar) and TaqmanTM system for genotyping. The rs314277 SNP was chosen for replication because it showed maximum association in the discovery cohort, whereas rs314276 was chosen because it had previously shown association with traits correlated with 2D:4D, such as age of menarche. 28

Statistical Analyses
Because ALSPAC and QIMR samples were genotyped on different arrays, consensus autosomal genotypic data were imputed with Markov Chain Haplotyping software (MACH) with the use of phased data from CEU individuals from release 22 of the HapMap project as the reference set of haplotypes. Only SNPs that could be imputed with relatively high confidence (R 2 > 0.3) and had a MAF > 1% were used in subsequent analyses. In the ALSPAC cohort, association analysis of imputed SNPs was performed assuming an underlying additive model with the use of the software package MACH2QTL, which accounts for uncertainty in prediction of the imputed data by weighting genotypes by their estimated posterior probabilities. In the QIMR twins study, the most likely genotypes were imputed at each locus, and these genotypes were subsequently analyzed with MERLIN. 29 Markers at physically genotyped loci on the X chromosome were analyzed with PLINK in the ALSPAC sample 30 and MINX in the QIMR cohort. 29 SNPs were tested for association with right 2D:4D, left 2D:4D, and the mean of left and right values. All analyses included sex as a covariate. Results for the two cohorts were then combined with the use of fixed-effects inverse variance meta-analysis. We also performed a chi-square test for heterogeneity to test whether the regression coefficients differed significantly between males and females for the regression of 2D:4D for all SNPs in the discovery GWAS (i.e., a test for additive genotype 3 sex interaction). The p values for the tests of heterogeneity were combined across ALSPAC and QIMR cohorts with the software package METAL to produce an overall level of significance. Association in the replication sample was performed in PLINK via linear regression assuming an underlying additive genetic model.

Results
The Pearson's product moment correlation between the in vivo and photocopied measurements of the length of the second and fourth digits was high for the right and left hands (all r > 0.97). The correlations between finger lengths measured with the use of digital calipers and those measured with the use of a computer-assisted program were similarly high (all r > 0.96), suggesting that our measurements have a high degree of repeatability. Table 1 displays the mean and SD of the 2D:4D measurements for the left hand, right hand, and mean of both hands in the ALSPAC and QIMR discovery cohorts, as well as in the ALSPAC replication sample. As expected, across samples and hands, the mean 2D:4D was higher for females than for males. Quantile-quantile plots (Q-Q plots) for the genome-wide association scan of ALSPAC individuals, the GWAS of the QIMR twins, and the combined meta-analyses are presented in Figure S1, available online. Both plots indicate that the observed GWAS test statistics lie close to expectation and suggest that potential technical and stratification artifacts had negligible impact on the results. Consistent with this interpretation, the genomic inflation factors in the ALSPAC (l ¼ 1.01), QIMR (l ¼ 1.01), and meta-analyzed (l ¼ 1.00) samples indicate little inflation of the association test statistics. We also checked whether p values derived from markers on the X chromosome might exhibit a more dramatic deviation from the null hypothesis of no association than the genome as a whole. Q-Q plots of the combined meta-analyses for left-hand 2D:4D, right-hand 2D:4D, and mean 2D:4D showed that there was little evidence that this was the case, although a few markers on the X chromosome did exceed null expectations for left-hand 2D:4D ( Figure S2).
The genome-wide association results for the combined meta-analysis of left-hand, right-hand, and mean 2D:4D are presented in Figures S3-S5. A single SNP, rs314277, in the LIN28B gene ( Figure 1) reached genome-wide significance for mean 2D:4D (p ¼ 4.1 3 10 À8 ), as well as suggestive significance for left-hand (p ¼ 1.5 3 10 À6 ) and righthand (8.2 3 10 À7 ) 2D:4D. Each copy of the minor allele was associated with a 0.6 increase in mean 2D:4D. No other imputed or genotyped SNP in this region met the criterion for genome-wide significance, although several SNPs, including rs314276, which had shown association with pubertal development and height in previous studies, 28 showed nominal evidence of association with 2D:4D (left: p ¼ 7.9 3 10 À5 ; right: p ¼ 1 3 10 À3 ; mean: p ¼ 5.4 3 10 À5 ). Interestingly, although the recombination rate in genomic regions 50 kB either side of rs314277 was low, there were few SNPs in appreciable r 2 with the marker, which may explain why the next most associated SNP had p values at least two orders of magnitude lower. Conditioning on rs314277 in both the ALSPAC and QIMR data sets reduced the signal at the surrounding loci but did not completely abolish all evidence of association (ALSPAC best p ¼ 0.0021; QIMR best p ¼ 0.0069), suggesting either the existence of a more strongly associated variant in the region that had not been imputed or a smaller second signal independent of rs314277 associated with 2D:4D in these data.
There was close correspondence in the topmost hits between left and right 2D:4D, which is not surprising, given the moderate to high phenotypic correlation between these variables (ALSPAC: r ¼ 0.69; QIMR r ¼ 0.56). We therefore present only the results for the mean 2D:4D in the main text. Tables S1-S3 list all SNPs with a combined p value of less than 1 3 10 À5 for the meta-analysis of left-hand, right-hand, and mean 2D:4D. These  p ¼ 0.0015 for rs2857533), nor in the HOXA cluster (all p > .05), which play important roles in limb development.
We also performed a chi-square test for heterogeneity to test whether the regression coefficients differed significantly between males and females. No variants exhibited significant differences in the magnitude of the regression coefficients between the sexes, although this may be partially a consequence of the low power of these tests to detect interactions. A complete list of variants with p values less than 10 À5 is displayed in Tables S4-S6.
We attempted to replicate the LIN28B association by genotyping both rs314277 and rs314276 in an additional 5129 children from the ALSPAC cohort (Table 2). Whereas rs314277 was strongly associated with 2D:4D in the  In each case, the minor allele is listed first in the table. Beta coefficients indicate the expected change in 2D:4D per additional minor allele.
replication cohort (b ¼ 0.44; 95% confidence interval: 0.26-0.62; p ¼ 1.53 3 10 À6 ), rs314276 showed only nominal association (p ¼ 2.26 3 10 À4 ). Indeed, conditioning on rs314277 suggested that the association at rs314276 could be entirely explained by the signal at rs314277 (p ¼ 0.202). The association between rs314277 and 2D:4D also remained after conditioning on height (p ¼ 1.64 3 10 À6 ). There was no evidence for interaction between genotype at rs314277 and sex in the replication cohort (p ¼ 0.31). Because 2D:4D has been hypothesized to reflect testosterone exposure in utero, we investigated a possible relationship between mother's genotype at rs314277 and offspring 2D:4D. Although there was some evidence for a relationship between maternal genotype and mean 2D:4D in offspring (p ¼ 0.05), any hint of association disappeared after conditioning on the child's genotype (p ¼ 0.99), suggesting that the mother's genotype at rs314277 did not directly influence the child's 2D:4D.
Finally, given the previous association between rs314277 and height, we explored whether any of the confirmed variants from 11 different GWAS of height and four GWAS of age at menarche were associated with 2D:4D from our meta-analysis, using the online GWAS catalog (see Web Resources). Apart from SNPs located at 6q21, the only other variants that showed nominal association with 2D:4D were rs4932217, near the gene POLG (MIM 174763; p ¼ 0.01), and rs2292303, in the gene NUP37 (MIM 609264; p ¼ 0.05). There did not appear to be an overlap in the direction of effect when the loci from all these studies were compared (i.e., variants associated with increased height did not generally appear to be associated with variants that increased [or, conversely, decreased] 2D:4D). Similarly, SNPs in the 9q31 region that have previously been associated with age at menarche 31 did not show association with mean 2D:4D either (all p > 0.05). A complete list of variants previously associated with height and age at menarche are presented in Table S7, along with results from the current combined meta-analysis of mean 2D:4D.

Discussion
The ratio between an individual's second and fourth digit (2D:4D) is sexually dimorphic and has been frequently used as a noninvasive retrospective biomarker for prenatal androgen exposure. 9 2D:4D has been shown to correlate with a wide range of diseases and physiological and psychological traits, including autism, 10 attention deficit disorder, 11 fertility, 12 myocardial infarction, 13 visuo-spatial ability, 14 homosexuality, 15,16 athletic performance, 17 and age at menarche. 18 In this study, we identified a variant located within intron 2 of the LIN28B gene, rs314277, which was robustly associated with 2D:4D.
LIN28B is the human ortholog of a gene that regulates developmental timing in Caenorhabditis elegans. Its product, LIN28-B, is an RNA-binding protein that interacts directly with let-7 precursors, preventing their processing to become mature miRNAs. 32 Polymorphisms within LIN28B have previously been associated with height 33 and age at menarche in females. 28,34,35 Most recently, Viswanathan et al. demonstrated that activation of LIN28B promotes neoplastic transformation and is associated with aggressive forms of human malignancy. 36 Although it is unclear at present how variation in LIN28B might influence 2D:4D, it is noteworthy that the gene is highly expressed in the testis, placenta, and fetal liver, 34,37 suggesting that LIN28B might influence 2D:4D early in development.
The association between rs314277 in LIN28B and 2D:4D is particularly interesting because the minor allele has previously been associated with increased height 33 and delayed menarche in females. 35 The direction of effect in these studies is consistent with the overall correlation between the variables, because girls who experience menarche later tend to be taller as adults than girls who reach puberty earlier. 38 Similarly, a recent study of 2D:4D and age at menarche found that girls with low right-hand 2D:4D (although not left-hand 2D:4D) tended to experience delayed menarche. 18 In the present study, however, the minor allele at rs314277 was associated with increased 2D:4D ratio, which is opposite to the direction predicted by these earlier reports. It is unclear why this might be the case, but it suggests that the relationship between 2D:4D, age at menarche, and height is complex. Given that the association between rs314277 and 2D:4D remained after conditioning on height, and given that no other SNPs that displayed association with height or with age at menarche showed convincing evidence of association with 2D:4D, it is unlikely that the effect of rs314277 is mediated through these other variables. It is interesting to note that LIN28B has two isoforms distinguished by the presence or truncation of a conserved cold-shock domain, which influences protein function. 38 It is possible that different isoforms of LIN28B and hence different biological pathways might influence 2D:4D, height, and age at menarche, helping to explain the interesting pattern of correlations.
Our study is not the only report to have questioned the specificity of 2D:4D as a proxy for androgen exposure. In fact, the relationship between 2D:4D and prenatal androgen levels has been directly examined in only one small sample (n ¼ 33), and no significant relationship between testosterone levels within the amniotic fluid (at 17 wks gestation) and 2D:4D (at 24 mo) was found, but a correlation was found between 2D:4D and the ratio of fetal testosterone to estradiol, 4,39 suggesting that the relationship between 2D:4D and circulating sex steroids might be more complex than previously hypothesized. In this study, we did not find evidence of a relationship between genetic variants in or near the androgen receptor and 2D:4D, although it is unclear how well the SNPs in this study might tag CAG repeat expansion at the androgen receptor.
In conclusion, we have demonstrated that a variant within the LIN28B gene is associated with 2D:4D and is important in the early development of the hands. The same variant is associated with height and age at menarche, but the direction of the association with 2D:4D was in the opposite direction to that predicted from an earlier study of 2D:4D and age at menarche. Our result suggests that the relationship between 2D:4D, early development, and fetal androgen exposure is likely to be more complex than previously appreciated.

Supplemental Data
Supplemental Data include seven tables and five figures and can be found with this article online at http://www.ajhg.org.