Ancestral diversity improves discovery and fine-mapping of genetic loci for anthropometric traits—The Hispanic/Latino Anthropometry Consortium

Hispanic/Latinos have been underrepresented in genome-wide association studies (GWAS) for anthropometric traits despite their notable anthropometric variability, ancestry proportions, and high burden of growth stunting and overweight/obesity. To address this knowledge gap, we analyzed densely imputed genetic data in a sample of Hispanic/Latino adults to identify and fine-map genetic variants associated with body mass index (BMI), height, and BMI-adjusted waist-to-hip ratio (WHRadjBMI). We conducted a GWAS of 18 studies/consortia as part of the Hispanic/Latino Anthropometry (HISLA) Consortium (stage 1, n = 59,771) and generalized our findings in 9 additional studies (stage 2, n = 10,538). We conducted a trans-ancestral GWAS with summary statistics from HISLA stage 1 and existing consortia of European and African ancestries. In our HISLA stage 1 + 2 analyses, we discovered one BMI locus, as well as two BMI signals and another height signal each within established anthropometric loci. In our trans-ancestral meta-analysis, we discovered three BMI loci, one height locus, and one WHRadjBMI locus. We also identified 3 secondary signals for BMI, 28 for height, and 2 for WHRadjBMI in established loci. We show that 336 known BMI, 1,177 known height, and 143 known WHRadjBMI (combined) SNPs demonstrated suggestive transferability (nominal significance and effect estimate directional consistency) in Hispanic/Latino adults. Of these, 36 BMI, 124 height, and 11 WHRadjBMI SNPs were significant after trait-specific Bonferroni correction. Trans-ancestral meta-analysis of the three ancestries showed a small-to-moderate impact of uncorrected population stratification on the resulting effect size estimates. Our findings demonstrate that future studies may also benefit from leveraging diverse ancestries and differences in linkage disequilibrium patterns to discover novel loci and additional signals with less residual population stratification.


Introduction
A complex interplay between political, social, and economic factors has led to an increasing obesogenic global environment in which many low-to-middle income nations have experienced a rapid transition from under-nutrition and growth stunting to over-nutrition and obesity. 1 In Latin America, by 2016, 35% of the total population was overweight (body mass index [BMI] 25 to <30 kg/m 2 ) and another 23% was living with obesity (BMI R 30 kg/m 2 ). 2 In Mexico, it is projected that by 2050 only 12% of men and 9% of women will have a healthy weight (BMI < 25 kg/m 2 ). 3 In South America in 2010-2011, the prevalence of obesity was 36%, but abdominal obesity (based on waist circumference) was even more common (53%). 4 Ancestry may also play a role in anthropometric-related health disparities in Hispanic/Latino populations. Previous studies have described the historical contexts leading to admixture in Latin American populations 5,6 as characterized by highly diverse (variable) ancestral proportions [7][8][9] from any of the following regions: the Americas, Europe, Africa, and East Asia. [10][11][12][13][14][15] The proportion of Native American ancestry is associated with obesity-related traits, and even more strongly associated with height. 16,17 Height is inversely associated with proportion of Native American ancestry, even after taking into account that over time populations globally have become taller due to mainly non-genetic nutritional factors. 16 The ultimate drivers of this association remain unclear; it is possible that genetic factors and/or socio-economic factors strongly associated with Native American ancestry could be responsible. Recent studies are starting to provide relevant insights into this topic, including a recent genome-wide association study (GWAS) in Peru 18 that identified a missense variant in the FBN1 gene (rs200342067) that has the largest effect size so far described for common heightassociated variants in human populations. In the 1000 Genomes Project samples, rs200342067 is only present in two Latin American samples (MXL, 0.78%; and PEL, 4.12%), and yet the authors reported that this missense variant shows subtle evidence of positive selection in the Peruvian population. 18 In the US, as in other high-income nations, both the population size and diversity in national origins (backgrounds) of Hispanic/Latinos have been increasing over the past several decades, 19 with 24% of the US adult population identifying as Hispanic/Latino by 2065. 19 US Hispanic/Latino adults and children/adolescents face a greater burden of obesity than their non-Hispanic white counterparts. [20][21][22][23] Thus, there is a need to study Hispanic/Latino populations to fully address these disparities. 23,24 Specifically, we sought to understand the role that Native American or other under-studied components of admixture have on the genetic architecture of anthropometric traits in Hispanic/Latinos, and their relationship with gene expression. To date, no large-scale GWAS of anthropometric traits has been conducted among Hispanic/Latino populations; we therefore performed a large-scale genomic study of multiple anthropometric traits, including BMI, height, and waist-to-hip ratio adjusted for BMI (WHRadjBMI), in Hispanic/Latino populations to describe what may be novel loci, or new signals in established loci, for this population.

Materials and methods
Hispanic/Latino study samples The Hispanic/Latino Anthropometry (HISLA) Consortium is comprised of 27 studies/consortia of adult participants. First, HISLA stage 1 includes 17 studies and one consortium (Consortium for the Analysis of the Diversity and Evolution of Latin America [CANDELA] 17 ) collectively representing up to 59,771 adults, depending on the trait, from Brazil, Chile, Colombia, Mexico, Peru, or the US with self-reported heritage from across Spanish-speaking Latin America, or Native American heritage, primarily Pima and Zuni 25 (Table S1). HISLA stage 2 includes 9 studies with up to 10,538 adults from across Spanish-speaking Latin America or with related heritage and living in the US (Table S1).
This study was approved by the institutional review boards of the University of North Carolina at Chapel Hill, and all contributing studies had received prior institutional review boards approval for each study's activities.

Anthropometric traits
BMI is a commonly derived index of obesity risk and is calculated as the ratio of body weight to height squared (kg/m 2 ). Adult height was measured or self-reported using either metric or US units and then converted to meters. Waist-to-hip ratio (WHR) is used to capture central fat deposition and is derived from the circumference of the waist at the umbilicus compared with the circumference of the hip at the maximum protrusion of the gluteal muscles.
Residuals were calculated by sex and/or case status, adjusting for age, age 2 , and study-specific covariates (e.g., center; principal components [PCs]). For WHR, we also adjusted for BMI when creating the residuals to isolate the central deposition of fat from overall body mass. Residuals were then used to create inverse normalizations of BMI and WHRadjBMI, and Z scores of height (¼residual/ standard deviation for all residuals). In family-based studies, the residuals were calculated in women and men together, adjusting for age, sex, and other study covariates including PCs. Descriptive statistics on the covariates and anthropometric measures are provided for each study's analytic sample in Table S2. Only one family-based study in stage 1 and two non-family-based studies in stage 2 (Genetics of Latinos Diabetic Retinopathy, 0.3% <18 years; and Mapping the Genes for Hypertension, Insulin Resistance, and Salt Sensitivity Study, 3.9%) included a small subset of adolescents aged 15-17 years, each less than 5% of the total sample. All other study samples included individuals aged 18-98 years.
Childhood/adolescence study samples, anthropometric traits, and obesity We assembled an independent sample of children/adolescents with anthropometrics from three studies from the US, Mexico, and Chile (Table S3). The distribution of covariates and anthropometrics of the samples of children/adolescents in each analysis are described in Table S4. First, childhood/adolescent obesity was defined as R95th BMI-for-age percentile (versus %50th BMI-forage percentile), based on the Centers for Disease Control and Prevention growth curves, 26 as done in previous analyses of childhood obesity. 27 We used these two analyses to look up the BMI and height findings from our adult HISLA meta-analysis as well as our trans-ancestral analyses. This resulted in 1,814 children/adolescents aged 2-18 years in a case-control analysis of childhood obesity (Tables S3 and S4). Second, BMI and height-for-age Z scores were calculated in children/adolescents aged 5-18 years from the US and Chile (Table S4) based on the more international reference growth curves from the World Health Organization. 28 In Viva la Familia, a family-based study, 29 these residuals were calculated adjusting for sex in the combined sample. The resulting BMI and height-for-age Z scores were available for 1,914 and 1,945 children/adolescents, respectively.

SNP imputation and statistical analyses
We generated autosomal genome-wide imputed data based on 1000 Genomes phase 1 and 3 references, except for two studies that contributed Exomechip and MetaboChip (Illumina, San Diego, CA) genotypes and one study that blended genotypes from multiple platforms (Tables S5 and S6). Principal-component analyses (PCA) were conducted in each study (see select examples provided in Figures S1-S3) to capture the main components of genetic ancestry from the Americas, Europe, Africa, and East Asia. Studies with samples from related individuals accommodated this non-independence by projecting their PCA from the reference to the study sample, and by accounting for relatedness using either generalized estimating equations 30 or mixed linear models. 9,31 Assuming an additive genetic model, we tested for the association of over 20 million autosomal variants on our traits, accounting for all trait-or study-specific covariates (e.g., center, PCs).

Meta-analyses of HISLA stage 1 þ 2
The studies of the HISLA Consortium were meta-analyzed in two stages: discovery (stage 1) and replication (stage 2). Stage 1 included a total sample of 59,771 individuals with data on BMI, 56,161 with height, and 42,455 with WHRadjBMI. All stage 1 studies/consortia provided full genome-wide analysis results. All SNPs that met our significance criteria were brought forward for replication in stage 2, which included 10,538 individuals with data on BMI, 8,110 with height, and 4,393 with WHRadjBMI. All reported association results passed our quality control criteria; i.e., variants with low quality (info score < 0.4 or R 2 < 0.3), minor allele count (MAC) < 5, or sample size < 100 were removed. We meta-analyzed effects across all studies using a fixed-effect inverse variance weighted meta-analysis with genomic control in METAL. 32 Given the unique patterns of admixture and ancestry represented by the Brazilian or Native American samples, we conducted sensitivity analyses in stage 1 studies (i.e., comparing the inclusion and exclusion of the Baependi Heart Study, the 1982 Pelotas Birth Cohort Study, and the Family Investigation of Nephropathy and Diabetes substudy of individuals of Pima and Zuni heritage) to assess the influence of these three studies on the meta-analysis results. CANDELA was retained in all analyses as <10% of the consortium's samples came from Brazil, primarily originating from the South of Brazil and being characterized as having high European heritage and less Native American or African admixture. 17 We provide the quantile-quantile plots for all analyses in Figure S4.
Regional plots of all GWAS significant HISLA stage 1 findings were plotted using LocusZoom. From stage 1, we selected lead variants for replication that met genome-wide significance (p < 5 3 10 À8 ) that were independent of each other. In cases where stage 2 studies did not have the lead variant, we selected two proxies per lead variant with a linkage disequilibrium (LD) r 2 R 0.9 using 1000 Genomes AMR. Stage 2 studies provided a list of the requested lead variants and/or their proxies from stage 1 for replication. Stage 2 studies were meta-analyzed and subsequently combined with stage 1 using METAL. 25 Effect heterogeneity was assessed through I 2 across all 27 HISLA adult studies/consortia by entering each study separately into the meta-analysis, irrespective of stage. The characteristics of the final SNP array data used in the HISLA adult studies and the children/adolescent Hispanic/ Latino studies are summarized separately in Tables S5 and S6.

Meta-analyses of HISLA stage 1 with other ancestral consortia
In addition to a Hispanic/Latino-only meta-analysis, we combined the HISLA stage 1 meta-analysis with data from previous largescale GWAS meta-analyses of European (the Genetic Investigation of Anthropometric Traits [GIANT] Consortium, [33][34][35] N $ 300,000) and/or African (the African Ancestry Anthropometry Genetics Consortium [AAAGC], 36,37 N $ 50,000) descent populations. We used fixed-effect inverse variance weighted meta-analytic techniques in METAL to generate our trans-ancestral meta-analysis. 32 We then assessed (1) the transferability of the findings from the BMI, height, 38 and WHRadjBMI 39 trans-ancestral meta-analyses to an independent sample of Hispanic/Latino children/adolescents or (2) the replication of the signal in the British subsample GWAS of the United Kingdom Biobank (UKBB). LD plots and regional plots are shown in the supplemental information ( Figures S5-S53).
In this paper, we consider a signal as replicated when the effect of an allele is observed in two independent populations with the same or overlapping ancestral background, whereas generalization (transferability) refers to the observation of the same signal in an independent sample but with a distinct ancestral background, or distinct period of the life course. Furthermore, SNP associations were then defined as either newly discovered or established, depending on their location. An established locus was defined as an SNP association within 5500 kb of at least one previously identified index SNP, otherwise the association was considered a newly discovered locus.
We designated our Hispanic/Latino SNP associations within either newly discovered or established loci as ''novel'' if they met the following criteria for replication: (1) were associated at p < 5 3 10 À8 in HISLA stage 1 and directionally consistent in the stage 2 independent sample, and (2) the addition of stage 2 samples improved the estimated p value of the stage 1 þ 2 metaanalysis. For the trans-ancestral analyses, the designation of a signal as novel was based on SNPs that were: (1) associated at p < 5 3 10 À8 in the combined HISLA, AAAGC, and GIANT meta-analysis, and (2) directionally consistent with the transancestral meta-analysis and associated at p < 5 3 10 À2 in an independent sample of Hispanic/Latino children/adolescents (generalized across age period) or in the British subsample GWAS from the UKBB (replication).
Hispanic/Latino SNP effects were considered to transfer (or generalize) to Hispanic/Latino children/adolescents or to African or European ancestry adults if they were: (1) directionally consistent, (2) associated at p < 5 3 10 À2 , and (3) had a heterogeneity of I 2 < 75% in the Hispanic/Latino children/adolescent lookups, the adult AAAGC, or the adult GIANT GWAS lookups. SNP effects of variants previously associated with anthropometric traits in non-Hispanic/Latino populations (i.e., index published SNPs) were considered to be transferable (generalizable) to Hispanic/ Latinos only if they were: (1) directionally consistent, (2) displayed a p < 5 3 10 À2 , and (3) had little to moderate effect heterogeneity (I 2 < 75%) in stage 1.

Fine-mapping methods
We used FINEMAP 62 for analyses of the newly discovered loci identified as part of the HISLA stage 1 meta-analysis or trans-ancestral meta-analysis, for both established and novel loci. For the established loci, we included index SNP associations published as of April 2018 (BMI, 33,36,40,[42][43][44]46,48,[51][52][53][54] height, 34,50,56 and WHRadjBMI 35,36,44,46,59 ) prior to the publications with the UKBB results. 38,39 We used a 1 Mb region subset of the summary statistics from the stage 1 meta-analyses and HCHS/SOL 9 unrelated sample set (N $ 7,670) to calculate the LD for each locus. For consistency with the FINEMAP package, we refer to the results using similar language; however, we do not mean to imply that the SNP(s) in the ''causal set'' are causative variants.
For trans-ancestral fine-mapping of the novel loci or new signals identified in the trans-ancestral meta-analysis of HISLA, AAAGC, and GIANT, we used a 1 Mb region defining each locus using the summary statistics of the given meta-analysis. We calculated the LD for Hispanic/Latino samples using the HCHS/SOL 9 unrelated sample (N $ 7,670). For African and European ancestry samples, we calculated the LD using the ARIC unrelated sample that included self-reported African ancestry (N $ 2,800) and European ancestry (N $ 9,700). We weighted the LD matrices by the GWAS sample sizes for each trait (HISLA range, $42,400-56,100; AAAGC, 20,300-42,700; GIANT, 210,000-330,000).
All regions allowed up to a maximum of 10 causal variants, as defined by FINEMAP. The cumulative 95th percentile credible set was calculated from the estimated posterior probabilities. Convergence failed for three regions (lead SNPs at known height loci: rs2902635, rs6900530, and rs4425978) using the stochastic approach. For these three regions, we used the conditional approach to determine number of causal variants.

Gene expression and other bioinformatic analyses
We performed association analyses of measured whole blood gene expression in 606 individuals from the Cameron County Hispanic Cohort. 63 RNA sequencing was conducted using 150 bp pairedend reads on the Illumina NovaSeq 6000 by Vanderbilt Technologies for Advanced Genomics. Initial sequencing quality was checked by FastQC. 64 STAR-2.7.8a was applied to align sequencing reads alignment to the human genome reference (UCSC, hg38), 65 and the aligned reads were assigned to genes using feature-Counts. 66 We excluded either samples with less than 15 million total aligned reads, a rate of successful alignment of less than 20%, or less than 15 million total assigned reads. The sequencing library size was normalized using DESeq2 67 and read counts were transformed using variance stabilizing transformations (vst in DE-Seq2 package). We performed expression quantitative trait loci (eQTL) analysis with our top HISLA SNP findings, by modeling SNP dosages (exposure) in a linear regression of gene expression levels (outcomes), for each gene within the 1 Mb interval around each lead SNP. We inverse normalized the gene expression levels and adjusted for age, sex, and three PCs to capture population substructure. Bonferroni correction for each region varied according to the number of SNPs tested.
To gain further insight into the possible functional role of the identified variants and to assess their relevance to other phenotypes, we conducted bioinformatic queries of our potentially novel loci and new signals within known loci in multiple publicly available databases, including PhenoScanner, 68 RegulomeDB, 69 Haploreg, 70 UCSC GenomeBrowser, 71 and GTEx. 72

Trans-ancestral findings to account for population structure in previous GWAS
We demonstrated the degree to which the present trans-ancestral meta-analysis could lessen the bias induced by population stratification, using height from HISLA as an example. We first conducted PCA on the four European populations (CEU, GBR, IBS, and TSI) from 1000 Genomes. We excluded the Finnish population because of its known unique demographic history that could drive or dominate the top PCs in a limited sample. 33 We only used biallelic SNPs with minor allele frequency (MAF) > 5% in the four European populations, and then pruned them by both distance and LD using PLINK 1.9. 73 Specifically, we pruned the dataset such that no two SNPs were closer than 2 kb, and then pruned using a 50 SNP LD window (moving in steps of 5 SNPs), such that no SNPs had r 2 > 0.2. We further removed SNPs in regions of long-range LD. 74 PCA was performed on the remaining SNPs using Eigensoft version 7.2.1.
We performed linear regressions of individual PC values on the allelic genotype count for each polymorphic variant in the four European populations from 1000 Genomes and used the resulting regression coefficients as the estimate of the variant's PC loading.
For each PC, we then computed Pearson correlation coefficients of PC loadings and effect sizes (of variants with MAF > 1%) from each GWAS summary statistic. We estimated p values based on Jackknife standard errors, by splitting the genome into 1,000 blocks with an equal number of variants. If the GWAS summary statistics are not biased by residual stratification (in this case due to European geographical structure), the correlation coefficients would be expected to be zero. If there was significant correlation in either the GIANT dataset or the HISLA stage 1, AAAGC, and GIANT transancestral meta-analysis, we then further evaluated the improvement of bias due to stratification in trans-ancestral meta-analysis by comparing the correlation coefficients in the trans-ancestral meta-analysis with those in GIANT. Restricting to variants shared between GIANT and the trans-ancestral meta-analysis, we computed their difference in correlation coefficients of PC loadings and effect sizes, and estimated p values again based on Jackknife standard errors from 1,000 equal sized blocks.

Discovery of one BMI locus in Hispanic/Latino adults
The first goal of this study was to conduct a genome-wide meta-analysis of anthropometric traits in Hispanic/Latino adults to identify loci in an under-studied population ( Figure 1). All regional plots of all potentially novel GWAS significant HISLA stage 1 findings are shown in the supplemental information (Figures S6-S11).
No novel anthropometric loci were identified in all HISLA stage 1 samples combined. Yet, when we excluded the samples of exclusively Brazilian or Native American heritage from stage 1, we discovered one locus for adult BMI at PAX3 on chromosome 2 in the HISLA stage 1 sample (Table S7) and replicated this locus in HISLA stage 2 (Table 1). The lead SNP at this locus, rs994108, is in moderate LD with a previously-reported SNP (rs7559271, r 2 ¼ 0.46 in 1000 Genomes AMR) ( Figure 2) and lies on the same haplotype as reported to influence facial morphology, including position of the nasion (the deepest point on the nasal bridge where the nose meets the forehead) in Europeans 75 and Hispanic/Latino 76 descent individuals. Other PAX3 variants in lower LD with the lead SNP have also been associated with nasion position, 77 monobrow, and male-pattern baldness. 78,79 PAX3 is a well-known transcription factor in normal embryonic neural crest development and differentiation. 80 Neural crest cells can give rise to mesenchymal stem cells, 81 which can in turn give rise to adipocytes; 81-83 thus, the possible role of PAX3 in adipogenesis may at least partially explain the association signal with BMI near this gene.
Another BMI SNP (rs1505851-T) near ARRDC3 on chromosome 5 associated at genome-wide significance in HISLA stage 1 (Table S7; Figure S6), but did not replicate in stage 2 (MAF ¼ 71%) or generalize to AAAGC or GIANT (effect allele frequency ¼ 31% or 68%; Table 1). However, the association was directionally consistent and showed some signal in AAAGC (p ¼ 7 3 10 À4 ). LD patterns at ARRDC3 appear to be similar across the three ancestries (albeit with a smaller LD block for 1000 Genomes AFR), meaning that lack of generalization may be related to frequency differences, haplotype effects, or a false positive ( Figure S5A). . Actual sample sizes may vary by SNP. ***The BMI and height-for-age Z score models were conducted using up to 1,914 and 1,945 of children/adolescents, respectively. In contrast, the obesity case-control study compared up to 1,814 children/adolescents who were R95th versus %50th BMI-for-age percentiles. We identified two WHRadjBMI loci at DOCK2 and TAOK3 at genome-wide significance in HISLA stage 1 after excluding the Brazilian and Native American samples (Table S7; Figures S7 and S8), yet neither met the p value threshold for replication in HISLA stage 2. The DOCK2 association for WHRadjBMI observed among women only in stage 1 ( Figure S7) was directionally consistent in the female stage 2 sample. There were more SNPs in high LD with rs6879439 in 1000 Genomes AMR (0.8 % r 2 % 1) than for AFR and EUR references ( Figure S5B), which may explain why this SNP association did not generalize to AAAGC or GIANT ( Table 1).
The genome-wide significant stage 1 TAOK3 association was led by a low frequency variant (rs115981023-A, MAF ¼ 0.9%), but the associations at this variant were not directionally consistent across stages (Table 1; Figure S8). In fact, patterns of very high LD were seen with rs115981023 in 1000 Genomes AMR and AFR ( Figure S5C), even though this variant is seen in African ancestry at a higher frequency (e.g., MAF ¼ 0.9% in HISLA versus 5% in AAAGC). Similarly, rs115981023 exhibited moderate heterogeneity across stage 1 samples after excluding Brazilian and Native American samples (I 2 ¼ 45%); evidence of moderate heterogeneity Figure 2. Regional plot of novel body mass index signal at PAX3 Regional plot, unconditioned (A) and conditioned (B) on established variants within 5500 kb of the lead variant, at the BMI locus at PAX3 in the HISLA (after excluding Brazilian and Native American samples). Linkage disequilibrium patterns are based on rs994108 (shown by the purple diamond) from the Hispanic Communities in Health Study/Study of Latinos. remained (I 2 ¼ 52%) in the combined meta-analysis of HISLA stage 1 and 2 samples (Table 1). Finally, this variant was the least frequent in European ancestry (MAF ¼ 0.2% in GIANT), which explained the lack of a proxy for generalization in GIANT (Table 1).
No potentially novel loci were identified for height in HISLA stage 1, and the exclusion of the Brazilian and Native American samples did not reveal additional height or WHRadjBMI loci.
Discovery of three signals in established loci for BMI and height in Hispanic/Latino adults At two established loci for BMI, we identified additional signals at ADCY5 and near ILRUN (Table S7). These signals were both independent of any previously published anthropometric findings (Table S8; Figures S9 and S10). We replicated these signals in stage 2 with directional consistency and in the combined stage 1 þ 2 meta-analysis at GWAS significance (Table 1). We also identified one additional signal for height in an established height locus, B4GALNT3, which was independent of the previously reported SNPs for height (Tables S7 and S8; Figure S11). We replicated this signal in stage 2 with directional consistency and a stage 1 þ 2 meta-analysis that was GWAS significant (Table 1). In additional gene expression and bioinformatics analyses (Tables S18-S20), we found that each of the three additional signals in established anthropometric loci is supported by an eQTL in whole blood in Hispanic/Latino populations (Table S18), and an eQTL in other relevant tissues, e.g., thyroid, esophagus, artery, using publicly available (non-Hispanic/Latino) datasets (Tables S19 and S20).

Fine-mapping of Hispanic/Latino anthropometric findings
We fine-mapped the PAX3 locus for BMI and the three additional signals in known loci (BMI, ADCY5 and ILRUN; height, B4GALNT3; Table S9). For the three BMI loci, FINEMAP revealed one potential causal set for each locus at PAX3, ADCY5, and ILRUN loci. For the PAX3 locus, this 95th percentile credible set contained only nine plausibly causal SNPs, with the lead SNP rs994108 having a very high posterior probability of being causal (0.89, Table S21). However, functional annotation of this SNP was unremarkable (Tables S22 and S23). In contrast, for ADCY5 and ILRUN, FINEMAP revealed one causal configuration for each locus but with much greater uncertainty of the likely functional variant given the size of the credible sets, which contained 14 and 22 SNPs in the credible region for ADCY5 and ILRUN, respectively. The posterior probability of the best lead SNP at these loci was relatively low with the best posterior probabilities of 0.23 for rs17361324 (ADCY5), and 0.11 for rs73420913 (ILRUN), respectively. Interestingly, however, the best candidate for causality at PAX3 and ADCY5 loci were the lead SNPs from the HISLA meta-analysis; for ILRUN, the FINEMAP and HISLA SNPs were in high LD (rs73420913 had an r 2 ¼ 0.96 with the lead HISLA SNP rs148899910), providing greater support for the prioritization of these SNPs for functional interrogation. For the B4GALNT3 height locus, FINEMAP revealed six causal configurations. Four of the variants (rs11063185, rs215230, rs7303572, and rs11063184) with each configuration had a posterior probability >0.99 and contained only the variant itself in the 95th percentile credible set. One variant (rs215223) had a posterior probability of 0.93 and thus included two variants in the 95th percentile credible set. The sixth 95th percentile credible set had a lead variant with a posterior probability of 45% but contained a total of 1,621 additional variants, all of which had very small posterior probabilities (i.e., %0.05).
Transferability of adult loci/signals from Hispanic/ Latinos to consortia of other ancestral backgrounds To assess how well the effect estimates are transferable to other populations, we looked up the BMI and height findings from Hispanic/Latinos in the AAAGC and GIANT meta-analysis results ( Table 1). The BMI signal at the ADCY5 locus (rs17361324) transferred to both AAAGC and GIANT with directional consistency (beta ¼ 0.13-0.23) and at nominal significance (p < 5 3 10 À2 ). The lead SNP (rs148899910) representing the BMI signal near ILRUN was not available in GIANT; the signal only appeared to be transferable to GIANT (at proxy SNP rs1573905, r 2 ¼ 0.96-1 in 1000 Genomes AMR and EUR; Table 1). The signal for height in B4GALNT3 (rs215226) was directionally consistent and nominally significant in AAAGC only. In all cases, the effect sizes observed in GIANT and AAAGC were attenuated compared with the effect sizes from HISLA stage 1.

Relevance of adult Hispanic/Latino anthropometric findings to childhood/adolescence
We looked up our novel HISLA findings in Hispanic/Latino children/adolescents using BMI-for-age and height-for-age Z scores, as well as a case-control study of childhood obesity. Two of the three novel BMI signals were direction-ally consistent with the anticipated effect on the odds of obesity during childhood/adolescence, one of which was nominally significant (rs17361324 at ADCY5; p ¼ 2.2 3 10 À2 ). None of the HISLA findings generalized at nominal significance with the BMI/height-for-age Z scores, but were directionally consistent with the corresponding effect in adulthood (Table S10). This may have been due to the small available sample size of Hispanic/ Latino children/adolescents.

Transferability of established anthropometric loci to Hispanic/Latino adults
We assessed how many established anthropometric loci, described previously in predominantly non-Hispanic/ Latino European samples, could be transferred to Hispanic/Latino adults, in light of the available Hispanic/ Latino sample size from stage 1. As shown in Table S11, the index SNPs at 336 of 1,247 (26.9%) previously reported BMI loci were suggestively transferable at nominal significance to Hispanic/Latinos. Of these BMI loci, 36 SNPs in the HISLA stage 1 displayed directional consistency with the literature and Bonferroni significance (Table S11). Furthermore, one BMI locus was genome-wide significant at the same published variant and another 12 BMI loci were genome-wide significant at another SNP within 1 Mb and in moderate to high LD (r 2 R 0.52 in AMR) with the reported index SNP (Table S7). Table S12 shows that a slightly higher percentage of known height loci (1,177 of 3,806, or 30.9%) were transferable to Hispanic/Latinos. Of these loci, 124 SNPs were directionally consistent and Bonferroni significant (Table S12). Ten height loci were genome-wide significant at the same lead variant, and another 39 height loci were associated at genome-wide significance at another SNP within 1 Mb (0.05 % r 2 % 0.98 in AMR; Table S7).
Finally, Tables S13-S15 show that 143 of 694 (20.6%) known WHRadjBMI in both sexes combined, 133 of 567 (23.5%) in women-only, and 28 of 173 (16.2%) in men-only loci were transferable to Hispanic/Latinos at nominal significance. Of these, a total of 15 loci were associated with WHRadjBMI at Bonferroni significance in the combined, women-or men-only analyses (Tables S13-S15). None of the index SNPs from the previous literature for WHRadjBMI reached genome-wide significance; however, we did observe genome-wide significant evidence for association of an SNP with WHRadjBMI in strong LD with the index variant (r 2 ¼ 0.92 in AMR) for the HOXC13 signal (Table S7).

Replication of five novel loci and 33 new signals in established loci for adult anthropometric traits from a trans-ancestral meta-analysis
Our secondary goal was to assemble a trans-ancestral metaanalysis of HISLA stage 1, AAAGC and GIANT consortia results to identify additional novel loci and fine-map established loci by leveraging differences in allele frequencies across populations ( Figure 1). As anticipated, this trans-ancestral meta-analysis of HISLA, AAAGC, and GIANT revealed new insights, including 8 novel loci and 35 new signals in established loci that were associated at genomewide significance (Table S16; Figures S12-S53) and independent of established SNPs within a 10 Mb region (Table 2). Of this set, 5 loci (3 BMI, 1 height, and 1 WHRadjBMI) and 33 signals in established loci (3 BMI, 28 height, and 2 WHRadjBMI) were generalized using the adult British subsample of the UKBB. In some cases, the significance in the trans-ancestral results were driven more by the AAAGC and/or HISLA consortia, which could explain the lack of association in the UKBB British subsample (Table S16; Figure S54).
We looked up the findings from our trans-ancestral meta-analyses in the sample of Hispanic/Latino children/ adolescents (Table S17). We found that 2 of the 7 BMI and height trans-ancestral loci, and 17 of the 33 transancestral BMI/height signals in established loci, were directionally consistent between their adult directions of association and the BMI/height-for-age Z scores in children/adolescents. However, this amount of directional consistency was not more than what would have been expected by chance alone (p binomial > 0.10). Four trans-ancestral SNPs were associated at nominal significance in the child/ adolescent sample, each having been already replicated in UKBB (Table S16). Three of these four loci were directionally consistent in the childhood/adolescence results with the trans-ancestral adult findings (Table S17).

Fine-mapping of trans-ancestral anthropometric findings
We also fine-mapped our trans-ancestral findings (Table S21) using FINEMAP to pinpoint individual variants and genes within each locus region that have a direct effect on the trait. FINEMAP uses a shotgun stochastic search algorithm 84 that iterates through causal configurations of SNPs by concentrating efforts on the configurations with non-negligible probability. Within a 1 Mb region, we report (1) the causal configuration of SNPs for a given trait that had the highest posterior probability and (2) the posterior probability of being causal for each of the SNPs.
For four of the five trans-ancestral loci (three BMI loci and one WHRadjBMI locus), there was one SNP within the configuration with the highest posterior probability. For the height locus near ANKRD36BP1, there were two SNPs in the configuration with the highest posterior probability. In all five loci, the SNP with the highest posterior probability from each of these credible sets was either the exact SNP with the strongest GWAS evidence or in high LD (r 2 between 0.70 and 0.99 in each ancestry) with the lead GWAS SNP. Two of these five regions had strong prioritization given high posterior probabilities (R0.8) and small 95th percentile credible sets: (1) for BMI, the CHD1-DT region had a posterior probability of 0.88 for rs150992 with three SNPs in the credible set, and (2) for height, the ANKRD36BP1 region had a posterior probability of 0.93 for rs10737541 with five SNPs in the credible set. From the functional annotations (Tables S22 and  S23), we find that all three of the BMI loci, the height loci, and WHRadjBMI loci have enhancer marks and eQTLs, most of which are in highly relevant tissues, e.g., adipose, brain, muscle, thyroid.
For the other trans-ancestral loci, the posterior probabilities were lower, between 0.09 and 0.42, yet four loci (rs9860730, rs17375290, rs4324883, and rs9463108) still had relatively few SNPs (<10) in the 95th percentile credible sets, suggesting a narrow window (combination of variants) around the causal variant. For example, functional annotations of rs17375290, the lead GWAS SNP in the NFIA locus associated with height, show it to have promoter markers in muscle, CADD score of 13.29 (CADD >10 ranks variants among the top 10% potentially deleterious), and an eQTL with FGGY in osteoclast tissue (Tables S22 and S23). Three of the other SNPs in the credible set (rs599989, rs1762881, and rs17121184) have nominally significant (p ¼ 0.01-0.005) eQTLs with FGGY in osteoclast tissue, but are not in high LD with rs17375290 (r 2 ¼ 0.03-0.1). Diseases associated with FGGY include autosomal recessive lateral sclerosis and spastic paraplegia type 7, which are known to affect height.
Within the 33 trans-ancestral signals in known loci, 31 had configurations with more than 1 putative causal SNP (e.g., more than 1 credible set). This made sense given these are loci with multiple independent signals, as described by our earlier conditional analyses. Among the putative causal SNPs within each locus, there were a number of SNPs that represented known signals (either the exact SNP or something in high LD among all ancestries). We found that, for many of these, the credible sets contained <10 SNPs. Among the 33 signals in known loci, 26 included a putative causal SNP that is the lead GWAS SNP reported here or an SNP in high LD (r 2 > 0.75) with the lead GWAS SNP, suggesting causality for this signal in general, although perhaps maybe not initially described at the most-putatively causal SNP(s). For these putatively causal SNPs, the posterior probabilities ranged from 0.09 to 1. Twenty-two of these SNPs had 95th percentile credible sets that contained <10 SNPs and 15 also had posterior probability R0.8.
Many have functional annotations that support the finemapping results (Tables S22 and S23). For example, we find eQTLs for the three BMI signals and enhancer marks for rs4807179 in relevant tissues, including adipose, brain, muscle, and/or thyroid. The lead SNPs of these credible sets had posterior probabilities >0.75 and the credible sets included <10 SNPs. Of the 28 identified height signals, we find 13 putatively causal SNPs that are the lead GWAS SNP, or are in high LD (r 2 > 0.75) with it, have <10 SNPs in the credible set and have eQTLs in relevant tissues, including muscle, thyroid, adipose, lung, and osteoclasts. Some also have promoter or enhancer marks in some of the same tissues. For the two WHRadjBMI signals, both have three SNPs in the most probable causal configurations. One of these causal SNPs for each region is either  the lead GWAS SNP (rs7975017) or an SNP in high LD (rs17099388 and rs6895040 LD: AFR r 2 ¼ 1.0; AMR r 2 ¼ 1.0; EUR r 2 ¼ 1.0), has a posterior probability R0.95, and is the only SNP in the credible set. Furthermore, for rs7975017, we find eQTLs in thyroid for multiple genes (BHLHE41, SSPN, and AC022509.3 from GTEx) and enhancer marks in multiple tissues including those related to the WHRadjBMI trait, e.g., thyroid, muscle, fat, bone, and adrenal gland. Overall, across many of the loci and secondary signals, FINEMAP revealed SNPs with somewhat strong prioritization (posterior probability R0.8) and, at some loci, putatively causal SNPs in small 95th percentile credible sets, thus demonstrating the utility of trans-ancestral approaches to fine-mapping GWAS loci.
Trans-ancestral findings to account for population structure in previous GWAS Previous height GWAS utilizing only European ancestry samples are known to exhibit signatures of residual stratification, which manifest in effect size estimates of height-associated SNPs being correlated with geographical structure in Europe. [85][86][87][88] In theory, this bias should be lessened with addition of non-European samples in a trans-ancestry GWAS, since geographical structure across different continental ancestries are not expected to be correlated with each other. We demonstrate this hypothesis empirically using the HISLA data. The first two PCs in the PCA of European populations ( Figure S55) reflect geographical or population structure in Europe, corresponding to the north-south and southeast-southwest axes of variation, respectively. We found that the bias in effect size estimates due to stratification is most obvious for height as this phenotype is known to differ across Europe. 85,89,90 Effect sizes on height estimated from the GIANT and our trans-ancestral meta-analysis were both highly correlated with the loadings of the first PCA (rho ¼ 0.125, p ¼ 3.2 3 10 À94 in GIANT; rho ¼ 0.105, p ¼ 3.4 3 10 À70 in meta-analysis). The correlation was much lower in AAAGC and HISLA (rho ¼ 0.012, p ¼ 2.17 3 10 À4 in AAAGC; rho ¼ 0.007, p ¼ 9.2 3 10 À2 in HISLA; Figure 3A). Importantly, the magnitude of correlation was lessened in meta-analysis compared GIANT alone (p ¼ 6.6 3 10 À9 ), consistent with our hypothesis. Other traits were not a priori known to be as differentiated across Europe as height, and thus the degree of correlation between effect sizes and PC loadings are much lower in GIANT (e.g., rho ¼ À0.025 for BMI; Figures 3B-3E).

Discussion
Hispanic/Latinos are a unique population with continental admixture from the Americas, Africa, and Europe, [10][11][12][13][14] and yet are underrepresented in GWAS. Herein, we present results from a large-scale meta-analysis of anthropometric traits on an ancestrally diverse sample of Hispanic/Latino adults ( Figures S1-S3). We have assembled a landmark consortium of Hispanics/Latinos to discover and map a total of 6 novel loci and 36 novel signals using both Hispanic/Latino population-specific and trans-ancestral discovery efforts (Figure 1). Numerous previously-reported anthropometric-SNP associations were suggestively (at nominal significance) or strongly (at Bonferroni significance) transferable to Hispanic/Latino adults. For example, between 16% and 31% of anthropometric variants transferred to Hispanic/Latino adults, depending on the given trait or sex-specific analyses conducted (Tables S11-S13).
In total, 67 previously reported loci reached genomewide significance in our Hispanic/Latino adult sample at the same index or another lead SNP, the majority of which were in high LD in 1000 Genomes EUR or AMR (Table S7). Moreover, we observed that four of seven of our HISLA findings were transferable to other ancestral populations at nominal significance. We note that, even though these findings provide additional evidence for transferability of common loci for anthropometrics, 91 still a number of previously reported anthropometric loci may not be transferable to this population in part due to variability in allele frequencies, effect sizes across ancestral populations, or our relatively smaller sample compared with European consortia. 55 Thus, absence of generalization does not equate to a lack of relevance to Hispanic/Latino adults or children, especially given that Hispanic/Latinos are under-studied population in genetic research and larger/comparable sample sizes are currently unavailable.
Our conditional and fine-mapping analyses revealed 36 signals in established anthropometric loci, which independently replicated in HISLA stage 2 or the UKBB British subsample. In addition, our lead SNPs for the BMI signals discovered at ADCY5 (from the HISLA meta-analysis) and ADAMTS9-AS2 (from the trans-ancestral meta-analysis) are both nominally associated with childhood obesity status aged between 2 and 18 years. Three of our trans-ancestral signals in established height loci also displayed association with height-for-age Z scores in children/adolescents aged between 5 and 18 years. These observations support the premise that diverse and trans-ancestral studies represent a valuable tool for leveraging ancestral differences and similarities both within and across populations to identify multiple signals in established association regions, identify putative variants that may account for some of the missing heritability of complex diseases, or reveal promising genes and SNPs for functional follow-up.
In light of the notable ancestral, geographical or environmental diversity of the samples analyzed in our metaanalyses, we observed evidence of allele frequency differences for many of our Hispanic/Latino ( Figure 4) and trans-ancestry findings ( Figure S53). Similar to reports from other diverse genome-wide analyses, 55 this allele frequency heterogeneity may explain heterogeneity in effects seen across consortia in our trans-ancestral HISLA, AAAGC, and GIANT meta-analysis (e.g., IGF2BP2 I 2 ¼ 78.7; MYO6 with I 2 ¼ 84.4, Tables 2 and S16). Our use of fixed-effect meta-analyses may have failed to identify loci with effect heterogeneity unrelated to allele frequency or LD differences across populations; future studies should address this limitation by considering trans-ancestral random-effects meta-analysis, local ancestry and haplo-type analyses as these studies explore sources of heterogeneity in large, diverse datasets. These observations reinforce how studies of one predominant ancestry group, such as Europeans, may fail to identify additional loci or, more likely, new signals in known loci that have allele frequency differences across ancestral populations.
Residual uncorrected stratification in GWAS could result in biased estimates of effect sizes. 34 For example, effect sizes on height from GIANT were reported to be significantly correlated with north-south axis of variation in Europe, suggesting residual uncorrected stratification, [85][86][87] which we also observe here. Note that the residual stratification is subtle, and while the effect sizes may be biased, this does not imply that the identified associations are spurious. For example, compared with effect sizes on height from UKBB, which is based on a single homogeneous population and results in better control of population stratification, the genetic correlation between GIANT and UKBB was 0.94. 85 Of the three traits studied here, height is the most stratified in Europe. The correlation coefficient between effect sizes on height and PC loadings reached 0.125 in the GIANT only for PC1, while it was much smaller for other traits (e.g., the maximum |rho| ¼ 0.042 in GIANT on WHR using only males on PC1). The decrease in bias in the trans-ancestral meta-analysis was also obvious in height. The correlation with PC1 was non-significant in HISLA (rho ¼ 0.007) and statistically significant but weak in AAAGC (rho ¼ 0.012), consistent with a decreased impact of European population stratification on the estimate of effect size in AAAGC and HISLA. This decreased correlation could be due to large non-European ancestries in these populations (African and Native American, respectively) that make these populations affected by population stratification in Europe; it could also be that, by using European ancestry-based loadings, we are less likely to detect non-European based population stratification patterns or that smaller sample sizes in these cohorts result in greater noise in effect size estimates. Regardless of the reason, compared with GIANT alone, trans-ancestral meta-analysis of the three cohorts showed less impact of uncorrected stratification on effect size estimates, even though the sample sizes in AAAGC and HISLA are comparably small. For other traits, the conclusions are qualitatively similar: that trans-ancestral meta-analysis lessened the bias due to stratification, even though the bias in GIANT was not as strong in the first place.
Gene expression and bioinformatic analyses of our population-specific (Tables S18-S20) and trans-ancestral findings in newly discovered loci (Tables S22 and S23) revealed important insights into the underlying biology of obesity, bone development, and growth. For example, the previously reported BMI locus ILRUN has also been associated with adult height 92,93 and height change during puberty. 94 The previously described BMI signal was lead at rs205262, an eQTL for another gene within the region (SNRPC) in European ancestry samples. 33 A second signal (rs75398113) has also been reported at SNRPC for extremes of the BMI distribution. 95 Yet, our signal led by rs148899910 is more than 300 kb away and in low LD with these two index SNPs (r 2 ¼ 0.01-0.05 in 1000 Genomes AMR). More recently, rs148899910 has been associated with height in Korean women. 96 Furthermore, variants in high LD with rs148899910 in 1000 Genomes AMR are associated with type 2 diabetes in individuals of East Asian ancestry 97 (rs4711389 has r 2 ¼ 0.9 in 1000 Genomes AMR with rs148899910), and with BMI-adjusted waist circumference in individuals of European ancestry (rs202228093 and rs2780226 each have an r 2 > 0.7 with rs148899910 in 1000 Genomes AMR). 44,98 Using whole blood gene expression data from 606 participants of the Cameron County Hispanic Cohort, we find evidence that our BMI signal at rs148899910 is an eQTL for increased gene expression of C6orf1 (p ¼ 3 3 10 À7 ) and not any other genes in the region (Table S18). Taken together, this signal shows associations across a wide array of anthropometric phenotypes.
In general, the lead SNPs from our HISLA-only meta-analyses appear relatively benign (not pathogenic) based on CADD and FATHMM-XF scores (Table S20). Yet, all SNPs potentially change motifs. Both rs17361324 (ADCY5) and rs215226 (B4GALNT3) have enhancer and promoter histone marks and eQTLs in the respective genes in relevant tissues. For BMI, there is an eQTL for rs17361324-ADCY5 in thyroid, and ADCY5 has been previously associated with type 2 diabetes, 99 BMI, 100 central obesity traits, 39 height, 47 birth outcomes, [101][102][103] and a number of other phenotypes. In addition, rs17361324 is proximate to an ADCY5 intronic variant (rs1093467, r 2 ¼ 0.3 in 1000 Genomes AMR) that is highly conserved across species (Haploreg v.4.1). For height, there is an eQTL for rs215226-B4GALNT3 in aortic (coronary) and tibial nerves. The lead SNP for the height signal in B4GALNT3, rs215226, has enhancer histone marks in bone and muscle, and promoter marks in muscle tissue. In addition, the variant rs215226 (B4GALNT3) has a posterior probability of 1 in FINEMAP analyses (see Table S9). Other interesting information about these regions is provided in Table S19.
The lead SNPs in our trans-ancestral loci were mainly located in intronic and intergenic regions (Table S22) and were benign. One exception was the locus C11orf63 associated with height led by rs11605693, which showed pathogenic scores for CADD and FATHMM-XF (CADD score ¼ 17.1 and FATHMM-XF score ¼ 0.87). This lead SNP has an eQTL in C11orf63 for adipose, tibial nerve, and testis. C11orf63, junctional cadherin complex regulator, is responsible for ependymal cells that line the brain and spinal cord.
Among the trans-ancestral findings, a BMI signal in the established locus RNH1 was led by rs10540 (posterior probability of 0.82), and is an eQTL for a wide range of tissues and genes (see Tables S21 and S23). Another signal in a known locus for height, led by rs12918773, has a posterior probability of 0.98 and is one of four casual variants suggested from fine-mapping in the locus (Table S21), has an eQTL (in lung, thyroid, tibial nerve and artery, breast, testis) with CDK10, a gene also associated with growth retardation. 104 In addition, rs1342330 led the newly discovered signal in a known height locus, and has a low regulomeDB score at 2b, and several enhancer and promoter histone marks in relevant tissues (Tables S22). As an intronic variant, it is an eQTL in the pancreas with PHACTR2 (Table S23), a gene associated with body dysmorphic disorder. 105 While many of our discovered loci/signals appeared to be benign based on CADD and FATHMM-XF scores, they still show enhancer and promoter histone marks in trait-relevant tissues, such as adipose tissue, bone, muscle, thymus, brain, and adrenal gland.
As described above, in this study we were able to (1) discover six additional loci with a notably smaller analytic size than other anthropometric consortia, such as GIANT. We also (2) discovered 36 signals in established loci in HISLA or our trans-ancestral meta-analysis, and (3) generated trans-ancestral effect estimates with better control for population structure. Taken together, these findings indicate the added value of building large, more diverse GWAS in the near future.
Large-scale analyses of diverse populations hold great potential for advancing the field of genetic epidemiology. 55 This study illustrates how studying admixed populations, such as Hispanics/Latinos, and highlighting . Variability in HISLA stage 1 þ 2, AAAGC, and GIANT p values, effect sizes, and coded allele frequencies for genome-wide significant anthropometric loci from HISLA stage 1 AAAGC, African American Anthropometry Genetics Consortium; BMI, body mass index; CAF, coded allele frequency; GIANT, Genetic Investigation of Anthropometric Traits; HISLA, Hispanic/Latino Anthropometry Consortium; WHRadjBMI, waist-to-hip ratio adjusted for BMI. *SNPs that remained significant (p < 5 3 10 À8 ) in HISLA stage 1 þ 2. them in trans-ancestral epidemiologic investigations, can yield additional insights into the genetic architecture of anthropometric traits. Future discovery efforts in Hispanic/Latino populations and with other ancestrally diverse populations will help address the concerning research gap between who is studied and who is affected by conditions, such as obesity, to the benefit of both public health and precision medicine.

Data and code availability
The HISLA meta-analysis results (