Ancestral diversity improves discovery and fine-mapping of genetic loci for anthropometric traits - the Hispanic/Latino Anthropometry Consortium

Hispanic/Latinos have been underrepresented in genome-wide association studies (GWAS) for anthropometric traits despite notable anthropometric variability with ancestry proportions, and a high burden of growth stunting and overweight/obesity in Hispanic/Latino populations. This address this knowledge gap, we analyzed densely-imputed genetic data in a sample of Hispanic/Latino adults, to identify and fine-map common genetic variants associated with body mass index (BMI), height, and BMI-adjusted waist-to-hip ratio (WHRadjBMI). We conducted a GWAS of 18 studies/consortia as part of the Hispanic/Latino Anthropometry (HISLA) Consortium (Stage 1, n=59,769) and validated our findings in 9 additional studies (HISLA Stage 2, n=9,336). We conducted a trans-ethnic GWAS with summary statistics from HISLA Stage 1 and existing consortia of European and African ancestries. In our HISLA Stage 1+2 analyses, we discovered one novel BMI locus, as well two novel BMI signals and another novel height signal, each within established anthropometric loci. In our trans-ethnic meta- analysis, we identified three additional novel BMI loci, one novel height locus, and one novel WHRadjBMI locus. We also identified three secondary signals for BMI, 28 for height, and two for WHRadjBMI. We replicated >60 established anthropometric loci in Hispanic/Latino populations at genome-wide significance—representing up to 30% of previously-reported index SNP anthropometric associations. Trans-ethnic meta-analysis of the three ancestries showed a small-to-moderate impact of uncorrected population stratification on the resulting effect size estimates. Our novel findings demonstrate that future studies may also benefit from leveraging differences in linkage disequilibrium patterns to discover novel loci and additional signals with less residual population stratification.


INTRODUCTION
A complex interplay between political, social, and economic factors has led to an increasing obesogenic global environment. In this modern context, many low-to middle-income nations have experienced a rapid transition from under-nutrition and growth stunting to overnutrition and obesity. 1 Moreover, population-based surveys from  show that there is an inverse ecologic relationship between the prevalence of growth stunting and the prevalence of overweight seen among preschool children (0-5 years of age) in Latin America. 2 Growth stunting of preschool children ranges from relatively rare (7%) in the Caribbean to notably common (20%) in Central America. Moreover, it is a risk factor for overweight/obesity independent of a child's socioeconomic status.
In Latin America, by 2016 35% of the total population was overweight [body mass index (BMI) 25 to <30 kg/m 2 ] and another 23% was living with obesity. 3 In Mexico, more than 71% of adults are currently overweight; 4 it is projected that by 2050 only 12% of men and 9% of women will have a healthy weight (BMI <25 kg/m 2 ). In a recent study in Argentina, Chile, and Uruguay, the prevalence of obesity was 36%, but when using waist circumference as a measure of central obesity, it was far higher (53%). 5 Within each of these populations, there are also disparities in obesity by sex and education.
Race, ethnicity, and ancestry may play a role in anthropometric-related health disparities in Latin American. Previous studies have described the historical contexts leading to admixture in Latin American populations 6; 7 as characterized by highly diverse (variable) ancestral proportions [8][9][10] from any of the following regions: the Americas, Europe, Africa and East Asia. [11][12][13][14][15][16] In fact, proportion of Native American ancestry is associated with numerous biomedical traits, like obesity-related traits, and is most strongly associated with height. 17; 18 Height is inversely associated with proportion of Native American ancestry, even after taking into account the fact that globally over time populations have become taller due to mainly non-genetic nutritional factors. 16 The ultimate drivers of this association remain to be elucidated; it is possible that genetic factors and/or socio-economic factors strongly associated with Native American ancestry could be responsible for these findings. Recent studies are starting to provide relevant insights on this topic. As an example, a recent genome-wide association study (GWAS) in Peru 19 identified a missense variant in the FBN1 gene (rs200342067) that has the largest effect size so far described for common height-associated variants in human populations (each copy of the minor allele reduces height by 2.2 cm). In the 1000 Genomes Project samples, rs200342067 is only present in two American samples (MXL: 0.78% and PEL: 4.12%), and yet the authors reported that this missense variant shows subtle evidence of positive selection in the Peruvian population. 19 Obesity in Latin America has quickly surpassed the levels previously seen only among adults of high-income nations, like Canada and the United States (US). In Canada the number of people reporting Latin American origins grew by 83% from the 2001 census 20 relative to the 2016 census, 21 representing 1.3% of the total Canadian population. In the US, both the population size and diversity in national origins (backgrounds) of US Hispanic/Latinos have been increasing over the past several decades. 22 If past demographic trends continue, 24% of the US adult population will identify as Hispanic/Latino by 2065. 22 Obesity-related financial costs in the US are projected to double every decade to ~$900 billion by 2030. 23; 24 US Hispanic/Latino adults and their children/adolescents face a greater burden of obesity than their non-Hispanic white counterparts. [25][26][27][28] There is a need to study Hispanic/Latino populations in order to address these disparities. 28; 29 Given the unique historical and recent demographic shifts occurring across the Americas, there is a clear need to also understand the role that Native American or other understudied components of admixture have on the genetic architecture of anthropometric traits in Hispanic/Latinos, and its relationship with risk of downstream poor health outcomes. Yet, to date no large-scale GWAS of anthropometric traits have been conducted among Hispanic/Latino populations. Here, we perform the largest genomic study to date of anthropometric traits, including BMI, height, and waist-to-hip ratio adjusted for BMI (WHRadjBMI) in Hispanic/Latino populations to describe what might be novel loci or signals in established loci in this population by sex and life stage.

Hispanic/Latino Study Samples
The Hispanic/Latino Anthropometry (HISLA) Consortium is comprised of 27 studies/ consortia of adult participants. First, HISLA Stage 1 includes 17 studies and one consortium (Consortium for the Analysis of the Diversity and Evolution of Latin America, CANDELA 18 ) collectively representing up to 59,771 adults, depending on the trait, from Brazil, Chile, Colombia, Mexico, Peru, or the US with self-reported heritage from across Spanish-speaking Latin America, or Native American heritage, primarily Pima and Zuni 30 (Table S1). HISLA Stage 2 includes nine studies with up to 10,538 adults from across Spanish-speaking Latin America or with related heritage and living in the US (Table S1).

This study was approved by the Institutional Review Boards of the University of North
Carolina at Chapel Hill, and all contributing studies had received prior Institutional Review Boards approval for each study's activities.

Anthropometric Traits
BMI is a commonly derived index of obesity risk and is calculated as the ratio of body weight to height squared (kg/m 2 ). Adult height was measured/self-reported using either metric units, or US units and then converted to meters. Waist-to-hip ratio (WHR) is used to capture central fat deposition, and it is derived from the circumference of the waist at the umbilicus compared to the circumference of the hip at the maximum protrusion of the gluteal muscles.
Residuals were calculated by sex and/or case status, adjusting for age, age 2 , and studyspecific covariates [e.g., center, principal components of ancestry (PCA)]. For WHR, BMI was also adjusted for when creating the residuals to isolate the central deposition of fat from overall body mass. Residuals were then used to create inverse normalizations of BMI and WHRadjBMI, and z-scores of height (=residual/standard deviation for all residuals). In family-based studies the residuals were calculated in women and men together, adjusting for age and sex and other study covariates including PCs. Descriptive statistics on the covariates and anthropometric measures of are provided for each study's analytic sample in Table S2. Only one family-based study in Stage 1 and two non-family based studies in Stage 2 (GOLDR 0.3% <18 years, and HTN-IRS 3.9%) included a small subset of adolescents aged 15-17 years, each less than 5% of the total sample. All other study samples included individuals 18-98 years of age.

Childhood/Adolescence Study Samples, Anthropometric Traits, and Obesity
We assembled an independent sample of children/adolescents with anthropometrics, from three studies from the US, Mexico and Chile ( Table S3). The distribution of covariates and anthropometrics of the samples of children/adolescents in each analysis are described in Table   S4. First, childhood/adolescent obesity was defined as having a ≥95 th BMI-for-age percentile versus ≤50 th BMI-for-age percentiles, based on the Centers for Disease Control and Prevention growth curves, 31 as done in previous analyses of childhood obesity. 32 We used these two analyses to look up novel BMI and height findings from our adult HISLA meta-analysis and our trans-ethnic analyses. This resulted in 1,814 children/adolescents aged 2-18 years for this case-control analysis (Tables S3-4). Second, BMI and height-for-age z-scores were calculated in children/adolescents aged 5-18 years from the US and Chile (Table S4) based on the more international reference growth curves from the World Health Organization. 33 In Viva la Familia, a 9 family-based study, 34 these residuals were calculated adjusting for sex in the combined sample.
The resulting BMI and height-for-age z-scores were available for 1,914 and 1,945 children/adolescents, respectively.

SNP Imputation and Statistical Analyses
We generated autosomal genome-wide imputed data based on 1000 Genomes Phase 1 and 3 references, with the exception of two studies that contributed Exomechip and MetaboChip (Illumina, Inc.; San Diego, CA) genotypes and one study that blended genotypes from multiple platforms (Tables S5-6). PCA analyses were conducted in each study to capture the main components of genetic ancestry from the Americas, Europe, Africa, and Asia. Studies with samples from related individuals accommodated this non-independence by projecting their principal component analysis from the reference to the study sample, and by accounting for relatedness using either generalized estimating equations 35 or mixed linear models. 10; 36 Assuming an additive genetic model, we tested the association of over 20 million autosomal variants on our traits, accounting for all trait or study-specific covariates (e.g., center, PCA).

Meta-Analyses of HISLA Stage 1+2
The studies of the HISLA Consortium were meta-analyzed in two stages, including discovery (Stage 1) and validation (Stage 2). Stage 1 included a total sample of 59,771 individuals with data on BMI, 56,161 with height, and 42,455 with WHRadjBMI. All Stage 1 studies/consortia provided full genome-wide analysis results. All SNPs that met our significance criteria were brought forward for validation in Stage 2, which included 10,538 individuals with data on BMI, 8,110 with height, and 4,393 with WHRadjBMI. All reported association results passed our quality control criteria; i.e., variants with low quality (info score <0.4 or Rsq<0.3), minor allele count (MAC) <5, or sample size <100 were removed. We meta-analyzed effects across all studies using a fixed-effect inverse variance weighted meta-analysis with genomic control in METAL. 37 Given the unique patterns of admixture and ancestry represented by the Brazilian or Native American samples, we conducted sensitivity analyses in Stage 1 studies (i.e., comparing the inclusion and exclusion of the Baependi Heart Study, 1982 Pelotas Birth Cohort Study, and Family Investigation of Nephropathy and Diabetes substudy of individuals of Pima and Zuni heritage) to assess the influence of the three studies on the meta-analysis results. CANDELA was retained in all analyses as <10% of the consortium's samples came from Brazil, primarily originating from the South of Brazil with wide-spread European heritage with a lesser extent Native American or African admixture. 18 Regional plots of all GWAS-significant HISLA Stage 1 findings were plotted using LocusZoom (https://locuszoom.org). From Stage 1, we selected lead variants that met genomewide significance (P<5x10 -8 ) that were independent of each other for replication. In cases where Stage 2 studies did not have the lead variant, we selected two proxies per lead variant with an r 2 ≥0.9 using 1000 Genomes AMR linkage disequilibrium (LD). Stage 2 studies provided a list of the requested lead variants and/or their proxies from Stage 1 for validation. Stage 2 studies were meta-analyzed and subsequently combined with Stage 1 using METAL 25

Meta-Analyses of HISLA Stage 1 with Other Ancestral Consortia
In addition to a Hispanic/Latino only meta-analysis, we combined the HISLA Stage 1 meta-analysis with data from previous large-scale GWAS meta-analyses from European (the Genetic Investigation of Anthropometric Traits, GIANT, Consortium [38][39][40] , N ~ 300,000) and/or African (the African Ancestry Anthropometry Genetics Consortium, AAAGC 41 , N ~ 50,000) descent populations. We used fixed-effect inverse variance weighted meta-analytic techniques in METAL to generate our trans-ethnic meta-analysis. 37 We validated our potentially novel BMI, height 42 and WHRadjBMI 43 findings from this trans-ethnic meta-analysis in either our independent sample of Hispanic/Latino children/adolescents or the British subsample GWAS of the United Kingdom Biobank (UKBB). Regional plots of these analyses of all potentially novel trans-ethnic findings are shown in the supplement (Figures S7-52).

Thresholds for Conditional Signals, Discovery, Validation and Transferability
We conducted approximate conditional analyses using the Genome-wide Complex Trait SNP associations were then defined as either newly discovered or established, depending on their location. An established locus was defined as a SNP association within ±500 kb of at least one previously identified index SNP, otherwise the association was considered a newly-discovered locus.
We designated our Hispanic/Latino SNP-associations within either newly-discovered or established loci as novel if they met the following criteria: 1) were associated at P-value<5x10 -8 in HISLA Stage 1 and directionally consistent in Stage 2, and 2) the addition of Stage 2 samples improved the estimated Stage 1+2 meta-analysis. For the trans-ethnic analyses these criteria were as follows: 1) were associated at P-value<5x10 -8 in the combined HISLA, AAAGC and GIANT meta-analysis, and 2) were both directionally consistent and associated at P-value<5x10 -2 in the subsample of Hispanic/Latino children/adolescents or in the British subsample GWAS from the UKBB.
Novel Hispanic/Latino SNP effects were considered to transfer to Hispanic/Latino children/adolescents, or to African or European ancestry adults, if they were 1) directionally consistent, 2) associated at P-value<5x10 -2 , and 3) had a heterogeneity of I 2 <75% in either the Hispanic/Latino children/adolescent lookups, or either 1) the AAAGC or 2) the GIANT adult GWAS results. Conversely, SNP effects of variants previously associated with anthropometric traits in non-Hispanic/Latino populations (i.e., index published SNPs) were considered to be transferable (generalizable) to Hispanic/Latinos if they were 1) directionally consistent, 2) displayed a P-value<5x10 -2 , and 3) had little to moderate effect heterogeneity (I 2 <75%) in Stage 1.

Fine-Mapping Methods
We used FINEMAP 66 for fine-mapping analyses of the newly-discovered loci identified as part of the HISLA Stage 1 meta-analysis or trans-ethnic meta-analysis, and in established loci. For trans-ethnic fine-mapping of the novel loci and signals identified in the trans-ethnic meta-analysis of HISLA, AAAGC, and GIANT, we used a 1Mb region defining each locus using the summary statistics of the given meta-analysis. We calculated the LD for Hispanic/Latino samples using the HCHS/SOL 10 unrelated sample (N ~ 7,670). For African and European ancestry samples, we calculated the LD using the ARIC unrelated sample that included selfreported African ancestry (N ~ 2,800) and European ancestry (N ~ 9,700). We weighted the LD matrices by the GWAS sample sizes for each trait (HISLA range: ~42,400-56,100; AAAGC: 20,300-42,700; GIANT: 210,000-330,000).
All regions allowed up to a maximum of 10 causal variants. The cumulative 95 th % credible set was calculated from the estimated posterior probabilities. Convergence failed for three regions (lead SNPs: rs2902635, rs6900530, and rs4425978, all in known height loci) using the stochastic approach. For these three regions, we used the conditional approach to determine number of causal variants.

13
To assess for potential validation of our potentially novel or validated HISLA hits, we performed association analyses of measured whole blood gene expression in 606 individuals from Cameron County Hispanic Cohort. 68 RNA sequencing was conducted using 150bp pairedend reads on the Illumina NovaSeq 6000 by Vanderbilt Technologies for Advanced Genomics.
Initial sequencing quality was checked by FastQC. 69 STAR-2.7.8a was applied to align sequencing reads alignment to the human genome reference (UCSC, hg38), 70 and the aligned reads were assigned to genes using featureCounts. 71 We excluded either samples with less than 15M total aligned reads, a rate of successful alignment of less than 20%, or less than 15M total assigned reads. The sequencing library size was normalized using DESeq2 72 and read counts were transformed using variance stabilizing transformations (vst in DESeq2 package).
We performed expression quantitative trait loci (eQTL) analysis with our top HISLA SNP findings, by modeling SNP dosages (exposure) in a linear regression of gene expression levels (outcomes), for each gene within the 1 MB interval around each lead SNP. We inverse normalized the gene expression levels and adjusted for age, sex, and three principal components to capture population substructure. Bonferroni correction for each region varied according to the number of SNPs tested.
To gain further insight into the possible functional role of the identified variants and to assess their relevance to other phenotypes, we conducted bioinformatics queries of our potentially novel loci and novel signals within known loci in multiple publicly available databases, including PhenoScanner, 73 RegulomeDB, 74 Haploreg, 75 UCSC GenomeBrowser, 76 and GTEx. 77

Trans-Ethnic Findings to Account for Population Structure in Previous GWAS
To quantify the impact of population stratification, we computed the correlation between PC loadings and beta effects estimated from GWAS. We first conducted PCA analysis on the four European populations (CEU, GBR, IBS, and TSI) from 1000 Genomes. We excluded the FIN (Finnish in Finland) population because of its known unique demographic history. 38 We only used biallelic SNPs with minor allele frequency (MAF) > 5% in the four European populations, and then pruned them by both distance and LD using PLINK 1.9. 78 Specifically, we pruned the dataset such that no two SNPs were closer than 2 kb, and then pruned using a 50 SNP LD window (moving in steps of 5 SNPs), such that no SNPs had r 2 >0.2. We further removed SNPs in regions of long-range LD. 79

One Novel BMI Locus Discovered and Validated in Hispanic/Latino Adults
The first goal of this study was to conduct a genome-wide meta-analysis of anthropometric traits in Hispanic/Latino adults to identify novel loci in an under-studied population (Figure 1). All regional plots of all GWAS-significant HISLA Stage 1 findings are shown in the supplement (Figures S1-6). No novel loci were identified in all samples combined.
Yet, when excluding the Brazilian or Native American samples from Stage 1, we discovered one locus for adult BMI at PAX3 on chromosome 2 in the HISLA Stage 1 sample (Table S7), and we validated this locus in HISLA Stage 2 ( Table 1). The lead SNP, rs994108, is in moderate LD is a well-known transcription factor in normal embryonic neural crest development and differentiation. 85 Neural crest cells can give rise to mesenchymal stem cells, 86 which can in turn give rise to adipocytes; [86][87][88] thus, the possible role of PAX3 in adipogenesis may at least partially explain the association signal with BMI near this gene. Another BMI SNP (rs1505851) near ARRDC3 on chromosome 5 found at GWAS significance in HISLA Stage 1 (Table S7, Figure   S1) did not validate in Stage 2 ( Table 1).
We identified two WHRadjBMI loci at DOCK2 and TAOK3 at GWAS significance in HISLA Stage 1 after excluding the Brazilian and Native American samples (Table S7, Figures   S2-S3), and neither met the p-value threshold for replication and in HISLA Stage 2. The DOCK2 association for WHRadjBMI was observed among women in Stage 1was, however, directionally consistent among women in Stage 2. The TAOK3 association was led by a low frequency variant (rs115981023) that was not directionally consistent across Stages. rs115981023 exhibited moderate heterogeneity across Stage 1 samples after excluding Brazilian and Native American samples (I 2 =45%), and this heterogeneity remained (I 2 =52%) in the combined meta-analysis of HISLA Stage 1 and 2 samples ( Table 1).
No potentially novel loci were identified for height in HISLA Stage 1, and the exclusion of the Brazilian and Native American samples did not reveal additional novel height or WHRadjBMI loci.

Three Novel Signals in Established Loci for BMI and Height Discovered and Validated in Hispanic/Latino Adults
At two established loci for BMI, we identified new signals at ADCY5 and near C6orf106, which has recently been renamed ILRUN (Table S7). These signals were both independent of any previously published anthropometric findings (Table S8, Figures S4-5). We validated these signals in Stage 2 with directional consistency and the combined Stage 1+2 meta-analysis at GWAS significance ( Table 1). We also identified one new signal for height in an established height locus, B4GALNT3, which was independent of the previously reported SNPs for height (Tables S7-8, Figure S6). We validated this signal in Stage 2 with directional consistency and a Stage 1+2 meta-analysis that was GWAS significant ( Table 1). In additional gene expression and bioinformatics analyses (Table S18-20), we found that each of the three novel signals in an established anthropometric loci is supported by either an eQTL in whole blood in Hispanic/Latino populations (Table S18), and/or an in eQTL other tissues from publicly available (non-Hispanic/Latino) datasets, e.g., thyroid, esophagus, artery (Table S19-20).

Fine-Mapping of Novel Adult Hispanic/Latino Anthropometric Findings
We fine-mapped using 1MB regions, the novel PAX3 locus for BMI and three new signals in known loci discovered and replicated in Stages 1+2 (BMI: ADCY5 and C6orf106; height: B4GALNT3; Table S9). For the three BMI loci, FINEMAP revealed one potential causal set for each locus at PAX3, ADCY5, and C6orf106 locus. For the PAX3 locus, only one causal set was proposed and the 95 th % credible contained only nine plausibly causal SNPs, with lead SNP rs994108 having a very high posterior probability of being causal (0.89, Table S21).
However, functional annotation of this SNP was unremarkable (Tables S22-23

Ancestral Backgrounds
To assess how well the effect estimates are transferable (generalizable) to other populations, we looked up the novel BMI and height findings from Hispanic/Latinos in the AAAGC and GIANT meta-analysis results ( Table 1). Keeping limitations with respect to sample size, LD, allele frequency, and effect size heterogeneity in mind, we did observe directionally consistent BMI effects at the PAX3 locus in the other consortia, although without observing nominal significance. The new BMI signals at the ADCY5 locus (rs17361324) transferred to both AAAGC and GIANT with directional consistency (betas=0.13-0.23) and nominal significance (P-values<5x10 -2 ). The BMI lead SNP (rs148899910) representing a novel signal near C6orf106 was available in AAAGC, the signal only appeared to be transferable to GIANT at a proxy SNP (rs1573905, r 2 =0.96-1 in 1000 Genomes AMR and EUR; Table 1).
The new signal for height in B4GALNT3 (rs215226) was directionally consistent and nominally significant in AAAGC. In all cases the effect sizes observed in GIANT and AAAGC were attenuated compared to the effect sizes from HISLA Stage 1.

Childhood/Adolescence
We looked up our novel HISLA findings in Hispanic/Latino children/adolescents using BMI-for-age and height-for-age z-scores, as well as a case-control study of childhood obesity.
Two of the three novel BMI signals were directionally consistent with the anticipated effect on the odds of obesity during childhood/adolescence, one of which was nominally significant (rs17361324 at ADCY5; P-value=2.2x10 -2 ). None of the novel HISLA findings generalized at nominal significance with the BMI/height-for-age z-score, but were directionally consistent with the corresponding effect in adulthood (Table S10). This may have been due to the small available sample size of Hispanic/Latino children/adolescents.

Transferability of Established Anthropometric Loci to Hispanic/Latino Adults
Using HISLA Stage 1 results, we assessed how many established anthropometric loci, discovered in predominantly non-Hispanic/Latino samples could be transferred to Hispanic/Latino adults, given the current sample size. As shown in  (Table S7).

Context
As shown in Figure 1, we pursued a secondary goal of assembling a trans-ethnic metaanalysis of HISLA Stage 1 with the AAAGC and GIANT consortia results to attempt to further leverage differences in allele frequencies across populations to identify additional novel loci and fine-map established loci. As anticipated, this trans-ethnic meta-analysis revealed eight new loci and 35 new signals in established loci that were associated at GWAS significance in the combined HISLA, AAAGC and GIANT meta-analysis (Table S16, Figures S7-S52), and independent of established SNPs within a 10Mb region ( Table 2). Of this set, five new loci (3 BMI, 1 height, and 1 WHRadjBMI) and 33 new signals in established loci (3 BMI, 28 Height, and 2 WHRadjBMI) were validated using the adult British subsample of the UKBB. In some cases, the significance in the trans-ethnic results had additional signal driven more by the AAAGC and/or HISLA consortia, which could explain the lack of association in the UKBB British subsample (Table S16, Figure S53). We looked up the potentially novel findings from our trans-ethnic meta-analyses in the sample of Hispanic/Latino children/adolescents (Table S17).
Four trans-ethnic SNPs were associated at nominal significance in the child/adolescent sample, each having been already replicated in UKBB ( Table S16). Three of these four loci were directionally consistent in the childhood/adolescence results with the trans-ethnic adult findings (Table S17). In summary, we found that two of the seven novel BMI/height trans-ethnic loci and 17 of the 33 new trans-ethnic BMI/height signals in established loci were directionally consistent between their adult directions of association and the BMI/height-for-age z-scores in children/adolescents. However, this directional consistency was not more than what would have been expected by chance alone (P-valuesbinomial>0.10).

Fine-Mapping of Trans-Ethnic Anthropometric Findings
We also fine-mapped the novel trans-ethnic findings ( Table S21)  Within the 33 novel trans-ethnic signals in known loci, 31 had configurations with more than one putative causal SNP (e.g. more than one credible set). This made sense given these are loci with multiple independent signals, as described by our earlier conditional analyses.
Among the putative causal SNPs within each locus, there were a number of SNPs that represented previously-known signals (either the exact SNP or something in high LD among all ancestries). We found that for many of these the credible sets contained <10 SNPs. Among the 33 novel signals in known loci, 26 included a putative causal SNP that is the lead GWAS SNP reported here or a SNP in high LD (r 2 > 0.75) with the lead GWAS SNP, suggesting causality for this signal in general, though perhaps maybe not initially discovered at the mostputatively-causal SNP(s). For these 24 putatively-causal SNPs, the posterior probabilities ranged from 0.09 to 1. Twenty-two of these SNPs had 95 th % credible sets that contained <10 SNPs and 15 also had posterior probability ≥0.8.
Many have functional annotations that help support the fine-mapping results (Table S22-S23). For example, we find eQTLs for the three BMI signals (and enhancer marks for rs4807179) in relevant tissues including adipose, brain, muscle, and/or thyroid. The lead SNPs of these credible sets had posterior probabilities >0. 75

Trans-Ethnic Findings to Account for Population Structure in Previous GWAS
The first two PCs in the PCA (Figure S54) reflect geographical or population structure in Europe, corresponding to the North-South and Southeast-Southwest axes of variation, respectively. We found that the bias in effect size estimates due to stratification is most obvious for height as the phenotype is known to be differentiated across Europe. [90][91][92] Effect sizes on height estimated from the GIANT and our trans-ethnic meta-analysis were both highly correlated with the loadings of the first PCA (rho = 0.125, P-value= 3.2x10 -94 in GIANT; rho = 0.105, P-value= 3.4x10 -70 in meta-analysis). The correlation was much lower in AAAGC and HISLA (rho = 0.012, P-value= 2.17x10 -4 in AAAGC; rho = 0.007, P-value= 9.2x10 -2 in HISLA; Figure 4A).
Importantly, the magnitude of correlation was lessened in meta-analysis compared with GIANT (P-value= 6.6x10 -9 ). Other traits were not a priori known to be as differentiated across Europe as height, and thus the degree of correlation between effect sizes and PC loadings are much lower in GIANT (e.g. rho = -0.025 for BMI; Figure 4B-E).

DISCUSSION
Hispanic/Latinos are a unique population with continental admixture from the Americas, Africa and Europe [11][12][13][14][15] and population of great interest for anthropometric studies. Here, we present results from a large-scale meta-analysis of anthropometric traits in Hispanics/Latinos.
As the first of its kind, we have assembled a large sample of Hispanics/Latinos to map a total of six novel loci and 36 novel signals using both Hispanic/Latino population-specific and transethnic discovery efforts (Figure 1). More than 1,600 anthropometric-SNP associations were transferable at nominal significance to Hispanics/Latinos-representing between 19-30% of all index sex-combined SNP-anthropometric associations (Tables S11-13). Sixty-seven previously reported loci reached GWAS significance at the same index or another lead SNP in our Hispanic/Latino adult sample (Table S7). Moreover, we established that four of seven of our novel HISLA findings were transferable to other ancestral populations at nominal significance.
We note that even though these findings provide additional evidence for transferability of common loci for anthropometrics, 93 still a number of previously-reported anthropometric loci may not be transferable to this population in part due to variability in allele frequencies or effect sizes across ancestral populations. 59 Our conditional and fine-mapping analyses revealed 36 novel signals in established anthropometric loci, which independently replicated in HISLA Stage 2 or the UKBB British subsample. In addition, our lead SNPs for the novel BMI signals discovered at ADCY5 (from the HISLA meta-analysis) and ADAMTS9-AS2 (from the trans-ethnic meta-analysis) are both nominally associated with obesity status between 2-18 years of age. Three of our new transethnic signals in established height loci also displayed association with height-for-age z-scores in children/adolescents between 5-18 years of age. These observations support our premise that diverse and trans-ethnic studies represent a valuable tool for identifying multiple signals and fine-mapping in established association regions. This was done with the overarching goal of identifying putative variants that will account for some of the missing heritability of complex diseases and reveal candidate genes and SNPs for functional follow-up.
In light of the notable ancestral, geographical or environmental diversity of the studies analyzed in our meta-analyses, we observed evidence of allele frequency differences for many of our novel discoveries (Figure 3 and Figure S53). Similar to reports from other diverse genome-wide analyses, 59 in some cases this allele frequency heterogeneity may drive the apparent heterogeneity effect across consortia in our HISLA, AAAGC, and GIANT metaanalysis (e.g., IGF2BP2 I 2 =78.7; MY06 with I 2 =84.4, Tables 2 and S16). These observations reinforce how studies of one predominant ancestry group, such as Europeans, may fail to identify novel loci or, more likely, new signals in known loci (given how many known loci there are currently) with allele frequency differences across ancestral populations.
Residual uncorrected stratification in GWAS could result in biased estimates of effect sizes. 39 For example, effect sizes on height from GIANT were reported to be significantly correlated with North-South axis of variation in Europe suggesting residual uncorrected stratification, 92; 94; 95 which we also observe here. Note that the residual stratification effect is subtle, and while the effect sizes may be biased, this does not imply the identified associations are spurious. For example, compared with effect sizes on height from UKBB, which is based on a single homogeneous population and results in better control of population stratification, the genetic correlation between GIANT and UKBB was 0. 94. 92 Of the three traits studied here, height is the most stratified in Europe. The correlation coefficient between effect sizes on height and PC loadings reached 0.125 in the GIANT only for PC1, while it was much smaller for other traits (e.g., the maximum |rho| = 0.042 in GIANT on WHR using only males on PC1). The decrease in bias in trans-ethnic meta-analysis was also obvious in height. The correlation with PC1 was non-significant in HISLA (rho = 0.007) and statistically significant but weak in AAAGC (rho = 0.012), consistent with a decreased impact of European population stratification on the estimate of effect size in AAAGC and HISLA. This decreased correlation could be due to large non-European ancestries known in these populations (Africans and Native Americans, respectively) that are less affected by population stratification in Europe; it could also be that by using European ancestry based loadings we are less likely to detect non-European based population stratification patterns or that smaller sample sizes in these cohorts resulting in greater noise in effect size estimates. Regardless of the reason, compared to GIANT alone, trans-ethnic meta-analysis of the three cohorts showed less impact of uncorrected stratification in effect size estimates, even though the sample size in AAAGC and HISLA are comparably small. For other traits, the conclusions are qualitatively similar: that trans-ethnic metaanalysis lessened the bias due to stratification, even though the bias in GIANT was not as strong in the first place.
As described above, in this study we were able to 1) discover six novel loci with a notably smaller analytic size than other anthropometric consortia like GIANT, 2) describe 36 new signals in established loci in HISLA or our trans-ethnic meta-analysis, and 3) generate trans-ethnic effect estimates with better control for population structure. Taken together, these findings indicate the added value of building large, more diverse GWAS in the near future.
Gene expression and bioinformatic analyses of our population-specific (Table S18-S20) and trans-ethnic findings in newly discovered loci gave us important insights into the underlying biology of obesity, bone development and growth (Tables S22-S23). For example, the previously reported BMI locus C6orf106 has also been associated with adult height 96; 97 and height change during puberty. 98 The first BMI signal described at C6orf106 was at index SNP, rs205262, an eQTL for another gene within the region, SNRPC, in European ancestry samples. 38 A second signal (rs75398113) has also been reported at SNRPC for extremes of the body mass index distribution. 99 Yet, our novel signal led by rs148899910 is more than 300kb away and in low LD with these two index SNPs (r 2 =0.01-0.05 in AMR). More recently, rs148899910 has been associated with height in Korean women. 100 Using whole blood gene expression data from 606 participants of the Cameron County Hispanic Cohort, we find evidence that our novel BMI signal at rs148899910 is an eQTL for increased gene expression of C6orf1 (p-value=3x10 -7 ) and not any other genes in the region (Table S18).
In general, the lead SNPs from our HISLA only meta-analyses appear relatively benign (not pathogenic) based on CADD and FATHMM-XF scores ( Table S20). All SNPs potentially change motifs. Both rs17361324 (ADCY5) and rs215226 (B4GALNT3) have enhancer and promoter histone marks and eQTLs in the respective genes in relevant tissues. For BMI, there is an eQTL for rs17361324-ADCY5 in thyroid, and ADCY5 has been previously associated with type 2 diabetes, 101 BMI, 102 central obesity traits, 43 height, 51 birth outcomes, [103][104][105] and a number of other phenotypes. Additionally, rs17361324 is proximate to an ADCY5 intronic variant (rs1093467, r²=0.3 in 1000 Genomes AMR) that is highly conserved across species (Haploreg v4.1). For height, there is an eQTL for rs215226-B4GALNT3 in aortic, and coronary arteries, and tibial nerve. The lead SNP for the height signal in B4GALNT3, rs215226, has enhancer histone marks in bone and muscle, and promoter marks in muscle tissue. In addition, the variant rs215226 (B4GALNT3) has a posterior probability of 1 for causality in FINEMAP analyses (see Table S9). Other interesting information about these regions is provided in Table   S19.
The lead SNPs at our newly discovered trans-ethnic loci were mainly located in intronic and intergenic regions (Table S22) and were benign. One exception was the novel locus C11orf63 associated with height led by rs11605693, which showed pathogenic scores for CADD and FATHMM-XF (CADD score=17.1 and FATHMM-XF score=0.87). This lead SNP has an eQTL in C11orf63 for adipose, tibial nerve, and testis. C11orf63, junctional cadherin complex regulator, is responsible for ependymal cells that line the brain and spinal cord.
Among the trans-ethnic findings, a new signal at a known locus for BMI, rs10540 at RNH1, has a posterior probability of 0.82 as one of two causal variants in the locus, and is an eQTL for a wide range of tissues and genes (see Tables S21 and S23), potentially making it relevant to body mass. A new signal in a known locus for height, led by rs12918773 that has a posterior probability of 0.98 and is one of four casual variants suggested from fine-mapping in the locus (Table S21), has an eQTL (in lung, thyroid, tibial nerve and artery, breast, testis) with CDK10, a gene also associated with growth retardation. 106 In addition, rs1342330, another new signal in a known height locus, has a low regulomeDB score at 2b and several enhancer and promoter histone marks in relevant tissues (Tables S22). As an intronic variant, it is an eQTL in the pancreas with PHACTR2 (Table S23), a gene associated with body dysmorphic disorder. 107 While many of the novel loci/signals appeared to be benign based on CADD and FATHMM-XF scores, they still show enhancer and promoter histone marks in trait relevant tissues such as adipose, bone, and muscle, thymus, brain, and adrenal gland.
Large-scale analyses of diverse populations hold great potential for advancing the field of genetic epidemiology. 59 This study illustrates how studying admixed populations, like Hispanic/Latinos, and leveraging them in trans-ethnic epidemiologic investigations, can yield additional insights into the genetic architecture of anthropometric traits. Future discovery efforts in Hispanic/Latino populations and with other diverse populations will address the research gap between who is studied and who is affected by conditions like obesity, to the benefit of both public health and precision medicine.

Supplemental Data
Supplemental Data include 23 tables and 54 figures.

Declaration of Interests
SMG and AMS receive funding from Seven Bridges Genomics to develop tools for the NHLBI BioData Catalyst consortium. All others authors declare no competing interests.

Acknowledgements
The Baependi Heart Study was supported through a collaborative effort by FAPESP and Brazil Health Ministry (PROADI). ACP was supported by NHLBI R01HL141881-01A1. The Hispanic Community Health Study/Study of Latinos was carried out as a collaborative study supported by contracts from the National Heart, Lung, and Blood Institute (NHLBI) to the

novel loci and 36 novel signals in known loci in the Hispanic/Latino Anthropometry Consortium (HISLA) Meta-Analysis and the Trans-Ethnic Meta-Analysis of HISLA and Consortia of Other Ancestral Heritages
*Stage 1 maximum sample sizes varied between and 59,771 for BMI, 56,161 for height, and 42,455 for WHRadjBMI (sex combined). **Stage 2 sample sizes varied between 10,538 for BMI, 8,110 for height, and 4,393 for WHRadjBMI (sex combined). Actual sample sizes may vary by SNP. ***The BMI and height-for-age z-score models were conducted using up to 1,914 and 1,945 of children/adolescents, respectively. In contrast, the obesity case-control study compared up to 1,814 children/adolescents who were either ≥95th versus ≤50th BMI-for-age percentiles  Hispanic/Latino Anthropometry Consortium (HISLA); African American Anthropometry Genetics Consortium (AAAGC); Genetic Investigation of ANthropometric Traits (GIANT); WHRadjBMIwaist to hip ratio adjusted for BMI. *Asterisks indicating a SNPs that were significant either as a novel locus or new signals in a known locus. Hispanic/Latino Anthropometry Consortium (HISLA); African American Anthropometry Genetics Consortium (AAAGC); Genetic Investigation of ANthropometric Traits (GIANT); WHRadjBMIwaist to hip ratio adjusted for BMI Table Titles and Legends   Table 1. Potential novel loci and new signals in known loci from the Stage 1: Adult HISLA Discovery combined with results from the Stage 2: Adult HISLA Validation. 1 In addition, lookup of results of each locus from the AAAGC and GIANT. Abbreviations: Chr -chromosome; EAF -effect allele frequency; HetIsq -heterogeneity Isquare; N -sample size; WHRadjBMI -waist to hip ratio adjusted for BMI; AAAGC-African American Anthropometry Genetics Consotrium; GIANT-Genetic Investigation of ANthropometric Traits 1 All studies were meta-analyzed using METAL (PMID 20616382), with each study entered individuals into Stage 1+2 analyses. 2 These BMI and WHRadjBMI analyses did not include Brazilian and/or Native American samples. 3 New loci or signals are those that were validated by HISLA stage 2 results that are directionally consistent with Stage 1 and remaining genome-wide significant after meta-analysis with Stage 1. 4 Proxy GIANT, rs1573905 (r2= 0.96 AMR) Table 2. Novel loci and new signals in established loci by trait from a meta-analyses of HISLA, AAAGC, and GIANT. Abbreviations: Chr -chromosome; EAF -effect allele frequency; HetIsq -heterogeneity Isquare; N -sample size; WHRadjBMI -waist to hip ratio adjusted for BMI; AAAGC-African American Anthropometry Genetics Consotrium; GIANT-Genetic Investigation of ANthropometric Traits 1 Each novel locus was defined by the absence of known (previously published) SNPs within 1Mb (+/-500 Kb) of the lead SNP. 2 Each known locus was defined by a 1Mb region around previously identified SNP(s) for the indicated trait; the known SNP(s), P<5e-8, at each established locus can be found in Table S16.     1 In addition, lookup of results of each locus from the AAAGC and GIANT.
2 These BMI and WHRadjBMI analyses did not include Brazilian and/or Native American samples.
3 New loci or signals are those that were validated by HISLA stage 2 results that are directionally consistent with Stage 1 and remaining genome-wide significant after meta-analysis with Stage 1.