A Comprehensive Genome-Wide and Phenome-Wide Examination of BMI and Obesity in a Northern Nevadan Cohort

The aggregation of Electronic Health Records (EHR) and personalized genetics leads to powerful discoveries relevant to population health. Here we perform genome-wide association studies (GWAS) and accompanying phenome-wide association studies (PheWAS) to validate phenotype-genotype associations of BMI, and to a greater extent, severe Class 2 obesity, using comprehensive diagnostic and clinical data from the EHR database of our cohort. Three GWASs of 500,000 variants on the Illumina platform of 6,645 Healthy Nevada participants identified several published and novel variants that affect BMI and obesity. Each GWAS was followed with two independent PheWASs to examine associations between extensive phenotypes (incidence of diagnoses, condition, or disease), significant SNPs, BMI, and incidence of extreme obesity. The first GWAS examines associations with BMI in a cohort with no type 2 diabetics, focusing exclusively on BMI. The second GWAS examines associations with BMI in a cohort that includes type 2 diabetics. In the second GWAS, type 2 diabetes is a comorbidity, and thus becomes a covariate in the statistical model. The intersection of significant variants of these two studies is surprising. The third GWAS is a case vs. control study, with cases defined as extremely obese (Class 2 or 3 obesity), and controls defined as participants with BMI between 18.5 and 25. This last GWAS identifies strong associations with extreme obesity, including established variants in the FTO and NEGR1 genes, as well as loci not yet linked to obesity. The PheWASs validate published associations between BMI and extreme obesity and incidence of specific diagnoses and conditions, yet also highlight novel links. This study emphasizes the importance of our extensive longitudinal EHR database to validate known associations and identify putative novel links with BMI and obesity.

Our study begins with the Healthy Nevada Project (HNP), a project centered around a Northern Nevada cohort formed in 2016 and 2017 by Renown Health and the Desert Research Institute in Reno, NV to investigate factors that may contribute to health outcomes in Northern Nevada. Its first phase provided 10,000 individuals in Northern Nevada with genotyping using the 23andMe platform at no cost. Renown Health is the only tertiary care health system in the area, and 75% of these 10,000 individuals are cross-referenced in its extensive electronic health records (EHR) database. The Renown EHR database contains 86,610 BMI measurements for these 10,000 individuals over twelve years, along with comprehensive disease diagnoses, (e.g. diabetes or eating disorders) and other general conditions such as pregnancy, allowing for precise individual phenotypic classifications and thereby leading to more robust and meaningful phenotype-genotype associations.
The focus of the comprehensive GWAS-PheWAS examinations of the Healthy Nevada Project (HNP) cohort and its EHR database is twofold: the first is to establish infrastructure to perform large-scale genomewide and phenome-wide association investigations in alliance with complex electronic health care records; the second is to validate wellknown published variants and associations with BMI and obesity in this cohort, as well as to identify possibly novel genotypic and phenotypic associations with BMI and extreme obesity.
Additionally, using linked EHR, the PheWASs examined the pleiotropy of HNP BMI and obesity associated SNPs: whether these variants are linked with other endocrine or metabolic diagnoses or conditions of this nature. A second PheWAS identified many known phenotypes related to BMI and obesity, especially to DM2, abnormal glucose levels, hypertension, hyperlipidemia, sleep apnea, asthma and other lessstudied BMI-related diagnoses.

MATERIALS AND METHODS
The Renown EHR database The Renown Health EHR system was instated in 2007 on the EPIC system (EPIC System Corporation, Verona, Wisconsin, USA), and currently contains lab results, diagnosis codes (ICD9 and ICD10) and demographics of more than one million patients seen in the hospital system since 2005.

Sample collection
Saliva as a source of DNA was collected from 10,000 adults in Northern Nevada as the first phase of the Healthy Nevada Project to contribute to comprehensive population health studies in Nevada. Via a donation from the Renown Health Foundation, the study offered the DNA testing for free to any and all participants willing to sign up for the study. The personal genetics company 23andMe, Inc. was used to genotype these individuals using the Oragene DX OGD-500.001 saliva kit [DNA Genotek, Ontario, Canada]. Genotypes are based on the Illumina Human OmniExpress-24 BeadChip platform [San Diego, CA, USA], that include approximately 570,000 SNPs.

IRB and ethics statement
This study was reviewed and approved by the University of Nevada, Reno Institutional Review Board (IRB,. Participants in the Healthy Nevada Project undergo written and informed consent to having genetic information associated with electronic health information in a deidentified manner. All participants were 18 years of age or older. Neither researchers nor participants have access to the complete EHR data and cannot map participants to patient identifiers. Patient identifiers are not incorporated into the EHR; rather, EHR and genetic data are linked in a separate environment via a unique identifier as approved by the IRB.

Processing of EHR data
Most participants had multiple BMI recordings across the thirteen years of EHR; the mean number of BMI records across the individuals was 12.2 records, with 215 the maximum number of records for the cohort. For the 5,811 individuals with more than one recorded BMI measure, a simple quality control step was first performed before computing the average BMI value. More specifically, if a participant had multiple BMI records, the coefficient of variation (CV) of the BMI values was computed; if the CV was greater than 0.096, any outlying BMI values greater than 1.5 standard deviations from the mean BMI for that individual was excluded for that individual. As 10% of the multiple BMI records presented a CV of 0.096 or greater, this threshold was chosen. Note that the relatively small percentage of participants presenting a CV of approximately 10% or greater indicates that variation across multiple records is unremarkable in most individuals. Indeed, the additional quality control step excluded one or more outlying BMI records in only 106 individuals; these 701 BMI records included values such as "2823.42" and values less than 10. Examples of outliers include 158.38 in an individual's set of values with mean 25.3 and "2874" in an individual with mean BMI measure of 22.4. Additionally, this quality control step allowed the study to include pregnant women: of the 464 pregnant women with BMI recorded for pregnant and nonpregnant phases, outlying pregnancy-related BMI records were easily identified and removed. The raw BMI values and quality-controlled average BMI values are presented in Supplementary Figure S1.
Genotyping and quality control Genotyping was performed by 23andMe using the Illumina Infimum DNA Human OmniExpress-24 BeadChip V4 (Illumina, San Diego, CA). This genotyping platform consists of approximately 570,000 SNPs. DNA extraction and genotyping were performed on saliva samples by the National Genetics Institute (NG1), a CLIA licensed clinical laboratory and a subsidiary of the Laboratory Corporation of America.
Raw genotype data were processed through a standard quality control process (Anderson et al. 2010;Verma et al. 2016;Schlauch et al. 2016;Verma et al. 2018;Schlauch et al. 2018). SNPs with a minor allele frequency (MAF) less than 0.005 were removed. SNPs that were out of Hardy Weinberg equilibrium (p-value , 1x10 -6 ) were also excluded. Any SNP with a call rate less than 95% was removed; any individual with a call rate less than 95% was also excluded from further study. There was an observable bias within the African American subcohort, thus 89 African American participants were excluded from this study. Specifically, upon plotting the first two principal components there was a clear population stratification between the African Americans and all other individuals who showed no other distinct groupings. The first principal component explained 14% of the total population stratification. Additionally, 107 patients with type I diabetes were removed, as were 29 participants with eating disorders recorded into their health record. After quality control, this left 500,508 highquality SNPs and 6,645 participants in the BMI cohort with mean autosomal heterozygosity of 0.318. The same process yielded 5,994 participants when all individuals with DM2 diagnoses were removed. Within the extreme obesity study, participants with BMI values between 18.5 and 25 were considered as controls, while any participant with BMI at least 35 kg/m 2 was considered a case subject. Again, any individual with type I diabetes and recorded eating disorders was removed. This resulted in a cohort size of 2,994 participants with 984 extreme obese cases and 2,012 lean controls with a mean autosomal heterozygosity of 0.316.
A standard principal component analysis (PCA) was performed on the genotype data to identify principal components to correct for population substructure. Genotype data were pruned to exclude SNPs with high linkage disequilibrium using PLINK v1.9 (Purcell et al. 2007) and standard pruning parameters of 50 SNPs per sliding window; window size of five SNPs; r 2 =0.5 (Anderson et al. 2010). The remaining 270,160 SNPs were used to calculate the principal components. Regression models were adjusted by the first four principal components, decreasing the genomic inflation factor of all obesity and BMI traits to l # 1.06.

Genome-wide association studies
Using PLINK, we first performed a simple linear regression of BMI vs. genotype using the additive model (number of copies of the minor allele) including age, gender and the first four principal components as covariates to correct for any bias generated by the population substructure. In the first BMI study, participants with DM2 were excluded. The second BMI study included DM2-diagnosed participants and included DM2 as a covariate in the statistical model. To test associations between obesity and genotype, a standard case-control logistic regression was applied, adjusting for age, gender and the first four principal components. Total phenotypic variance explained by all 500,508 SNPs was calculated by first producing a genetic relationship matrix of all SNPs on autosomal chromosomes in PLINK. Subsequently, a restricted maximum likelihood analysis was conducted using GTCA (Yang et al. 2011) on the relationship matrix to estimate the variance explained by the SNPs.

Analysis of variance
The mean BMI values across genotypes presented in Supplementary Tables S1 and S2 correlate with negative and positive effect sizes: SNPs showing a negative effect size have a decrease in mean BMI values across the genotypes from left to right (homozygous in major allele, heterozygous, homozygous in minor allele). The 6,645 log-transformed quality-controlled and averaged BMI measures were nearly normally distributed. As one-way ANOVA computations are robust against even moderate deviations of normality (Blanca et al. 2017), parametric ANOVA methods were used to make comparisons across the genotypes. All ANOVA F-test p-values of the significant SNPs identified in the two BMI studies are statistically significant at the alpha=0.05 level, even after a simple Bonferroni correction (.05/27 =0.0019, and .05/20=0.0025, respectively). Supplementary Table S3 presents the proportion of obese cases across each genotype. A simple test of equal proportions (Pearson's chi-square test) is performed across these proportions. All p-values associated with the test of equal proportions in Supplementary Table S3 are also statistically significant at the 0.05 significance level upon a conservative Bonferroni multiple testing adjustment (.05/34=-.0015).

Power of GWAS
The software program QUANTO (Gauderman 2002) was used to calculate sample sizes to detect effect sizes in the range [0.5,1] and odds ratios in [1,1.5] with at least 80% power under the additive model, at a two-sided Type I error level of 5%. Using the rate of extreme obesity (BMI $ 35 kg/m 2 ) as 14.5% from Ogden et al. (Ogden et al. 2006), the case-control GWAS study of approximately 1,000 cases and 2,000 controls has sufficient statistical power ($ 80%) with MAFs of 16% or greater to detect odds ratios of 1.225 or greater. As the MAF increases, the power to detect smaller odds ratios increases: for example, with a MAF of 25%, our sample size was adequate to detect odds ratios of size 1.18 or higher. With a small MAF of 8%, the power was also at least 80% to detect effect sizes as small as 0.58 in the BMI GWAS cohort of 6,645. With MAF of 17%, power was at least 80% to detect effect sizes as small as 0.425. Larger MAFs clearly can detect larger effect sizes with the same sample size. Specific effect sizes and MAFs can be seen in Table 3.

Phenome-wide association study
The R package PheWAS ) was used to perform two independent PheWAS analyses for each of our studies. The first examined associations between statistically significant SNPs identified in the respective GWASs and EHR phenotypes based on ICD codes. The second PheWAS identified associations between BMI levels or incidence of obesity, respectively, and ICD-based diagnoses. ICD9 and ICD10 codes for each individual in the cohort recorded in the Renown EHR were aggregated via a mapping from the Center for Medicare and Medicaid services (https://www.cms.gov/Medicare/Coding/ICD10/ 2018-ICD-10-CM-and-GEMs.html). A total of 22,693 individual diagnoses mapped to 4,769 documented ICD9 codes. ICD9 codes were aggregated and converted into 1,814 individual phenotype groups ("phecodes") using the PheWAS package as described in Carroll and Denny (Denny et al. 2013;Carroll et al. 2014). Of these, only the phecodes that included at least 20 cases were used for downstream analyses, following Carroll's protocol . Age and gender were standard covariates included in the PheWAS models. The first type of PheWAS detected associations between statistically significant SNPs (p,1x10 -5 ) identified in each of the three GWASs above and case/control status of EHR phenotypes represented by ICD codes. Specifically, a logistic regression between the incidence (number of cases) of each phenotype group (phecode) and the additive genotypes of each statistically significant SNP was performed, including age and gender as covariates. Possible associations of the phecodes with at least 20 individuals with each previously detected SNP were assessed. Two levels of significance were computed: the first, on which the reported results are based, was generated by first calculating the adjusted p-values for the multiple hypothesis tests using the Benjamini-Hochberg false discovery rate (FDR) (Benjamini and Hochberg 1995) and selecting the raw p-value corresponding to the FDR = 0.1 significance level, following Denny's protocol (Denny et al. 2013). This level is represented by a red line in PheWAS images. For visual purposes only, a threshold based on a Bonferroni correction for all possible associations made in this analysis (p=0.05 / N ps , where N ps is the sum of the number of phecodes tested for each individual SNP, across all identified SNPs), is represented by a blue line in PheWAS images.
A second PheWAS, as outlined in Carroll et al. (2014)  , was performed to examine associations between BMI, and secondarily, obesity, and the phecodes. Specifically, a linear regression between the BMI measures and the case/control status of a phecode was performed (with age and gender as covariates) for each phecode including at least 20 individuals. Significance levels corresponded to the FDR value of 0.1 and are not shown in either figure due to space constraints. The Bonferroni corrections for the BMI study and obesity study were 3.3x10 -5 and 3.7x10 -5 , respectively. In Figures 4 and 6

Genotype data
The data that support the findings of this study are available from 23andMe but restrictions apply to the availability of these data, which were used under license for the current study, and thus are not publicly available. The data are, however, available for qualified researchers upon reasonable request and with permission of the Institute for Health Innovation and 23andMe. Researchers who would like to obtain the raw genotype data related to this study will be presented with a data user agreement, which requires that the participants will not be re-identified and no data will be shared between researchers or uploaded onto public domains. Due to the public nature of this article, and genetic privacy requirements, the Institute for Health Innovation and 23andMe require that the statistics for only 10,000 SNPs be made publicly available. This is the amount of data considered to be insufficient to enable a reidentification attack. The statistical summary results of the top 10,000 SNPs for the 23andMe data are available here: www.dri.edu/HealthyNVProjectGenetics. All column definitions are listed in Table 1.
The IHI collaborates with scientific researchers on an individual basis. Examples of restrictions that will be considered in requests to data access include but are not limited to: 1. Whether the request comes from an academic institution in good standing and will collaborate with the Institute for Health Innovation to protect the privacy of the participants and the security of the data requested 2. Type and amount of data requested 3. Feasibility of the research suggested 4. Amount of resource allocation for the IHI and Renown Hospital required to support the collaboration Any correspondence and data availability requests should be addressed to Joe Grzymski at (joeg@dri.edu) or Craig Kugler (craig.kugler@dri.edu).

PheWAS results
Summarized counts of each ICD classification (ICD-9 and ICD-10) and phenotype group (phecode) are presented in Supplementary Table S7.

Institutional review board
This study was reviewed and approved by the University of Nevada, Reno Institutional Review Board (IRB, project 956068-12).

RESULTS
Characteristics of cohort 10,000 participants were genotyped of which 6,870 participants were linked to Renown Hospital electronic health records with the covariates required to conduct this study: age at consent, gender, ethnicity and BMI. We removed 225 participants based on the exclusion criteria as stated in our methods. Our final cohort characteristics of the 6,645 individuals are described in Table 2, which illustrates the makeup of the cohort with respect to gender, age, ethnic origin, and standardized value of BMI after removal of outliers using our custom algorithm. For the extreme obesity (Class 2 and 3) case vs. control study, the normal (healthy) control range consisted of BMI values between 18.5 and 25 kg/m 2 , while the case obese values were any BMI $ 35 kg/m 2 (Hruby and n■ The value of the test statistic P The p-value for the association test Hu 2014). The number of participants in each range is displayed in Table 2.

GWAS of BMI in the healthy Nevada cohort
Using quality-controlled BMI values, two separate GWASs were performed to find genotypic associations with BMI using PLINK. In the first association study, all individuals diagnosed with DM2 were excluded to focus on the association between the genotype and BMI under the additive model with adjustments for gender, age and the first four principal components (PC1-PC4). The second association analysis included all DM2-diagnosed individuals and added DM2 as a covariate to the model, again using the additive model with adjustments for gender, age, diabetes status, and PC1-PC4. Genomic inflation coefficients (lambda) were computed for the two separate cohorts: 1.06 for the association without DM as a covariate, and 1.06 for the association where DM is a covariate. Any SNP with association p-value of p , 1x10 -5 was considered a statistically significant association, based on the standard of the NHGRI-EBI Catalog of published genome-wide association studies [https://www.ebi.ac.uk/gwas/docs/methods/criteria], as well as obesity studies performed by Frayling et al. . Genetic variance in the BMI study with DM2 cases removed was 15.78%; genetic variance was 17.49% in the BMI study with DM2 cases included. The first GWAS was performed on 5,994 total participants without DM2 and identified 20 SNPs across seven chromosomes at statistical significance defined by p,1x10 -5 (Table 3). The majority of these mapped to the FTO gene on chromosome 16, while two SNPs mapped to TDH on chromosome 8 (Supplementary Figure S2). Of the 20 SNPs, 15 were shown to be associated with BMI in previous publications (Scuteri et al. 2007;Frayling et al. 2007;Dina et al. 2007;Zeggini et al. 2007;Yanagiya et al. 2007;Hinney et al. 2007;Hunt et al. 2008;Price et al. 2008;Grant et al. 2008;Hotta et al. 2008;Loos et al. 2008;Tan et al. 2008;Villalobos-Comparán et al. 2008;Thorleifsson et al. 2008;Willer et al. 2009;Meyre et al. 2009;Wing et al. 2009;Liu et al. 2010;Shimaoka et al. 2010;Fawcett and Barroso 2010;Speliotes et al. 2010;Wang et al. 2011;Prakash et al. 2011;Okada et al. 2012;Cha et al. 2012;Berndt et al. 2013;Wheeler et al. 2013;Graff et al. 2013;Olza et al. 2013;Boender et al. 2014;Qureshi et al. 2017;Hudek et al. 2018;González-Herrera et al. 2019). A large majority of the SNPs (17/20) lie within noncoding regions of genes and are intronic in nature. It is interesting to note that our strongest associations lie within the FTO gene (p , 3.5x10 -6 ). Results are presented in Table 3: BMI without DM2 lists the significant associations of our cohort that exclude all DM2 diagnoses. BMI with DM2 presents significant associations with BMI in the cohort that includes participants with a DM2 diagnosis. Effect sizes (and their standard deviations) are presented as change in BMI per each copy of the minor allele. Raw per-SNP p-values are presented.
The SNP on chromosome 17 is of particular interest, as it has the largest effect of any SNP identified in our study (b=0.90). It is also the rarest SNP tested in our cohort with minor allele frequency (MAF) 8.17%. The median MAF across the strongest associative SNPs in both studies is 40%, which demonstrates that most of the SNPs are common and thus result in relatively moderate individual effect sizes. Most of the SNPs lie within noncoding intronic regions; this is of interest as although noncoding SNPs do not alter the amino acid sequence of the translated protein, they may be linked to disease, as several previous studies show that polymorphisms within introns can affect intron splicing n■ as well as transcriptional and translational efficiency (Lalonde et al. 2011).
Case-control GWAS of extreme obesity in the healthy Nevada cohort A complementary GWAS was performed to identify genotype-phenotype links in extreme obesity (BMI $ 35) versus non-obese (BMI between 18.5 and 25 kg/m 2 ) in our cohort. This study incorporated 2996 participants (984 extreme obese cases, 2012 non-obese controls), and under the log-additive model with adjustments for gender, age and the first four principal components, identified 26 SNPs across six chromosomes that were associated with obesity at p , 1x10 -5 , with approximately 70% associated with obesity and BMI in prior studies ( Figure 2). The percentage of phenotypic variance attributed to genetic variation based on all the SNPs was 15.7%. The genomic inflation coefficient (lambda) for the obesity cohort was computed as 1.05. We also include eight SNPs found slightly above the significance threshold in the FTO gene that are reported in several studies as obesity-related (Ehrlich and Friedenberg 2016;West et al. 2018). In comparison to the two quantitative-trait BMI GWASs, this study identified several more associations around the NEGR1 gene on chromosome 1. We also identified SNPs in two genes, PFKFB3 and n■  CABP5, which are associated with obesity in other studies (Scuteri et al. 2007;Jiao et al. 2008;Nakajima et al. 2016). Note that all the mutations in the FTO gene increase the odds of obesity risk. Table 4 lists the strongest SNPs associated in our extreme obese vs. non-obese GWAS. Effect sizes and their standard deviations are presented as odds ratios. Raw p-values generated by the GWAS are also presented. From our three separate GWASs, we identified fifteen different chromosomal cytoband regions across ten chromosomes associated with at least one BMI or obesity-related trait. All but three of those cytoband regions contained a gene, while the remaining cytoband hits were in noncoding regions of the genome. Approximately 70% of the SNPs identified in this study were linked to BMI and obesity in prior studies (Tables 3 and 4), validating our methods (Scuteri et al. 2007;Frayling et al. 2007;Dina et al. 2007;Yanagiya et al. 2007;Hinney et al. 2007;Jiao et al. 2008;Pietiläinen et al. 2008;Grant et al. 2008;Hotta et al. 2008;Thorleifsson et al. 2008;Joe et al. 2009;Nakajima et al. 2016;Thomsen et al. 2016;Justice et al. 2017). The functions of the genes which lie within the cytoband regions are outlined in Table 5.

Analysis of variance
The mean BMI values across genotypes presented in Supplementary Tables S1 and S2 correlate with negative and positive effect sizes: SNPs showing a negative effect size have a decrease in mean BMI values across the genotypes from left to right (homozygous in major allele, heterozygous, homozygous in minor allele). Note that BMI levels increase with the increase of the number of minor alleles, which is typical of variants in FTO ). All ANOVA F-test p-values of the significant SNPs identified in the two BMI studies are statistically significant at the alpha=0.05 level, even after a simple Bonferroni correction (.05/27 =0.0019, and .05/20=0.0025, respectively). Supplementary Table S3 presents the proportion of extremely obese cases across each genotype. A box and whisker figure of ANOVA results for one of the strongest associations (rs9939609) is shown in Supplementary Figure S3.

PheWAS of BMI and obesity
Beyond the GWASs, we present here two comprehensive PheWAS studies that follow each GWAS. The first examines pleiotropy, i.e., whether additional phenotypic associations exist between the statistically significant SNPs associated with BMI or obesity in our cohort. The second investigates which EHR phenotype groups are associated with BMI; more specifically, the analysis identifies whether the number of individuals in an EHR phenotype group is a predictor of BMI and/or extreme obesity.
A PheWAS tested the 20 statistically significant SNPs identified in the first BMI GWAS (the sub-cohort with no DM2-diagnoses) for association with 562 EHR phenotypes and resulted in no statistically significant associations at the false discovery rate of 0.1 (data not shown). The top two associations showed that a locus on FUT9 (rs4839813) associated with obesity, and rs1620977 on NEGR1 associated with morbid obesity with raw p-values p=2.3x10 -5 and 2.8x10 -5 , respectively. The second PheWAS identified 179 EHR (phenotypic) associations of BMI (DM2-diagnosed participants excluded) with p,2x10 -2 , that associates to an adjusted p-value of 0.1 (see Materials and Methods). Included in the strongest phenotypic associations are obesity, morbid obesity, and overweight (p,1x10 -80 ), sleep apnea (p,1x10 -45 ), hypertension (p,1x10 -40 ), abnormal glucose (p,1x10 -25 ), hyperlipidemia, asthma, GERD, osteoporosis, and others. (Data not shown).
The PheWAS of the second BMI GWAS (DM2-diagnosed individuals included) examined whether 633 EHR phenotype groups containing at least 20 participants are dependent on the genotypes of the 27 statistically significant SNPs associated with BMI in our cohort (Figure 3). Results of this PheWAS indicate that TDH and FAMA167-AS1 are strongly associated with DM2. Variants in the FTO gene associate with obesity and overweight phenotypes. An association with hypertension and essential hypertension and the locus rs12412241 on chromosome 10 was detected (p,1x10 -4 ). Several strong associations with FTO loci and the prostate-specific antigen (PSA) were also found. Variants in the FTO gene also associated with obesity at the p=3x10 -4 level, and with hypercholesterolemia with p=1x10 -3 level. Significant associations (p,1.02x10 -4 , associated to an adjusted p-value of 0.1, see Materials and Methods) are included in Table 6 and illustrated in Figure 3, where the blue line represents the Bonferroni correction of p=3x10 -6 . The second PheWAS examined links between BMI and the 1,523 EHR phenotype groups containing at least 20 individuals in this cohort and showed that 301 such clinical phenotypes groups associated with significance p,1.96x10 -2 . (This significance level is associated to an adjusted p-value of 0.1, as described in the Materials and Methods) These are shown in Figure 4. Included in the highest associations are obesity, morbid obesity, and overweight (p,1x10 -100 ), DM2 (p,1x10 -87 ), hypertension ((p,1x10 -82 ), sleep apnea (p,1x10 -80 ), abnormal glucose (p,1x10 -53 ), hyperlipidemia, asthma and other respiratory disorders, GERD, edema, liver disease, mood disorders, polycystic ovaries, and others. Significant associations are presented in Supplementary Table S4 and Figure 4. Only associations at p,1x10 -15 are annotated in the image for ease of viewing. Note that a single-SNP Bonferroni correction results in a significance level of 3.3x10 -5 .
We performed the same two PheWASs on the case-control GWAS. The first PheWAS identified possible associations between 34 SNPs and 372 phenotype groups with at least 20 individuals ( Figure 5). The significance level corresponding to an FDR of 0.1 was p=3.85x10 -4 , resulting in 50 significant associations. The Bonferroni correction p-value is 4x10 -6 and shown in blue (see Materials and Methods). The strongest associations occurred between the NEGR1 and KLRB1 gene and obesity and morbid obesity and the NEGR1 gene and abnormal glucose. The NEGR1 gene, along with TDH and FAM167A-AS1, also associated with impaired fasting glucose and diabetes, respectively, at a slightly lower, yet still significant p-values. The locus rs2948300 on chromosome 8 associated with essential hypertension. The SNP rs1620977 on NEGR1 is linked with incidence of bronchitis. Additionally, several strong associations of irritable bowel syndrome (IBS) and digestive disorders with loci in CABP5 are shown. Results are pictured in Figure 5 and included in Table 7.
The second PheWAS in this case-control study identified possible links between 1,362 EHR phenotype groups with at least 20 individuals and the incidence of extreme obesity ( Figure 6). The significant threshold of p=1.4x10 -2 enabled an FDR of 0.1. At this significance level, 191 significant associations were identified, including obesity (p,1x10 -134 ), hypertension (p,1x10 -65 ), sleep apnea (p , 1x10 -43 ), abnormal glucose (p,1x10 -40 ), hyperlipidemia, asthma, and GERD. The high level of association with obesity validates our methods. These associations are shown in Figure 6, with phenotypes annotated above a significance level of 1x10 -15 for ease of viewing. A line is drawn at the significance level 1x10 -15 as guidance. Results are included in Supplementary Table S5. Note that a single-SNP Bonferroni correction results in a significance level of 3.7x10 -5 .

GWAS of healthy Nevada BMI and obesity
Here we present three GWASs on participants in the Healthy Nevada Project. The first investigates associations with BMI on subjects that do not have DM2 diagnoses. The second identifies associations with BMI in which participants with DM2 are included as a comorbidity. The third GWAS is a case-control study of extreme obesity that complements outcomes of the first two quantitative trait studies. Each GWAS is followed with two independent PheWASs to examine pleiotropy and additional phenotypic associations with quantitative BMI levels and incidence of obesity.
The first GWAS tested the association between genotype and BMI without DM2, to remove DM2 effects on BMI. As expected, the majority of the resulting associations were found in SNPs that lie within the FTO gene. This gene has been associated to BMI and obesity in several studies and is a major focal point in obesity-related research (Scuteri et al. 2007;Frayling et al. 2007;Dina et al. 2007;Zeggini et al. 2007;Hinney et al. 2007;Hunt et al. 2008;Price et al. 2008;Grant et al. 2008;Hotta et al. 2008;Loos et al. 2008;Tan et al. 2008;Villalobos-Comparán et al. 2008;Thorleifsson et al. 2008;Willer et al. 2009;Meyre et al. 2009;Wing et al. 2009;Shimaoka et al. 2010;Fawcett and Barroso 2010;Wang et al. 2011;Prakash et al. 2011;Okada et al. 2012;Berndt et al. 2013;Wheeler et al. 2013;Graff et al. 2013;Olza et al. 2013;Qureshi et al. 2017;González-Herrera et al. 2019). Moreover, the two strongest associations in the GWAS were from SNPs in FTO (rs1558902/ rs1421085, p = 1.67x10 -7 /1.75x10 -7 ), highlighting the overall importance of this gene in relation to obesity. Frayling suggests that the association of FTO SNPs with DM2 is mediated through BMI ). The exact mechanism by which the FTO gene affects BMI is not understood; however, it has been discovered that the gene product of FTO mediates oxidative demethylation of several different RNA species, such as mRNA, snRNA and tRNA (Jia et al. 2011;Wei et al. 2018). This indicates that protein produced from FTO likely operates as a RNA regulatory molecule, which can affect both gene expression as well as translation initiation and elongation (Wei et al. 2018).
Two SNPs within TDH were found to be strongly associated to BMI. This gene codes for a nonfunctional L-threonine dehydrogenase, lacking most of the C-terminus found in other species, and is thus characterized as a putative pseudogene. Previous research has identified this gene as a possible susceptibility gene for obesity (Yanagiya et al. 2007); however, relatively little is known about any functional consequences of SNPs within this pseudogene. We also observed a strongly associated locus in NEGR1, one of the first genes shown to have variants associated to BMI (Thorleifsson et al. 2008;Willer et al. 2009;Speliotes et al. 2010;Boender et al. 2014). This gene codes for a cell adhesion molecule, although its function in relation to BMI is still unknown. Previous research in mice determined that deletions of NEGR1 cause a decrease in weight and a change in the regulation of energy balance, implying that NEGR1 most likely functions to control the regulation of energy balance (Lee et al. 2012;Boender et al. 2014).
We hypothesized that including DM2 participants (and thus DM2 as a covariate in the genetic model) would produce a more parsimonious fit, as many studies show a relationship between diabetes and BMI. We discovered that diabetes was indeed an important predictor of BMI for all n■ n■ Possibly has a role in learning and longterm memory (Micu et al. 2006) This table presents functions of genes associated to all SNPs found significantly associated to one or more BMI and obesity traits in the GWASs. 500,000 regressions performed (p , 2x10 -16 for every SNP), regardless of age, gender or genotype. Furthermore, adding DM2 as a predictor in the additive model increases the significance of associations between SNPs in the TDH gene and BMI; five out the top eight most significant associations fall within this gene (Table 3). It is clear that incidence of DM2 in our cohort affects the genetic association of BMI. Specifically, when DM2 patients are excluded in our cohort, there are two associations within TDH. When DM2 participants are included, we observe five associations with the TDH gene. This indicates, along with the BMI PheWAS results, that not only does TDH influence BMI measurements, it also has an association with DM2. It is rare for SNPs to be effectors of two separate diseases, even those as intertwined as BMI and DM2 (Grarup et al. 2014). A possible explanation why TDH has not been previously associated with DM2 is due to a lack of statistical power to observe the small risk increases TDH may impose on DM2 (Grarup et al. 2014). Nonetheless, the increased rate of DM2 diagnosis worldwide makes this an interesting candidate gene. How the SNPs in the TDH pseudogene may influence either BMI or DM2 is unknown, as evidence supporting the association between TDH and BMI/DM2 is scant; however, previous research has discovered that not all pseudogenes are "junk" DNA. Some of these genes can be actively transcribed to produce short interfering RNAs (siRNAs), which can regulate gene expression (Pink et al. 2011). In certain cases, they can even competitively bind micro-RNAs (miRNAs), which can attenuate repression of cellular mRNA . Additionally, the expression of pseudogene transcripts tends to be tissue-specific. Given that the greatest expression of TDH transcripts is found in the pancreas (Fagerberg et al. 2014), one might speculate that TDH affects the production of insulin and/or digestive enzymes. If true, this may account for our observation where TDH influences BMI measurements, and is associated with DM2. Given the strong associations between TDH SNPs and DM2, as well as potential regulatory functions of pseudogenes, we believe it is essential that future studies focus on determining the function of the TDH pseudogene in a tissue-specific context. Although studies using genetically modified mice with a TDH polymorphism and proteomics analysis of their pancreatic tissue would be straight-forward, to the best of our knowledge, no such studies have been reported. We see this as a possible future direction.
Adding DM2 as a covariate into the statistical model also identified two additional genes that may influence BMI: RTN4RL1 and CABP5. RTN4RL1 is a gene that codes for a cell surface receptor and was previously found to be upregulated approximately 2-fold when exposed to bone morophogenetic protein 4 (BMP4), a protein that is increased in diabetic animals and may reduce insulin secretion (Christensen et al. 2015). This implies that the effects of RTN4RL1 on BMI may be secondary to its main effect on diabetes. Moreover, this gene has also been listed as a potential candidate gene for DM2 in previous GWAS (Thomsen et al. 2016). The gene CABP5, which codes for a calcium binding protein that has role in calcium mediated cellular signal transduction (Haeseleer et al. 2002) may have a more direct effect on BMI. It was previously discovered as part of a group of several genes that were upregulated in obese individuals, although its exact function relative to obesity is still unknown (Nakajima et al. 2016).
A case-control association study examining the effects between genetics and the risk of extreme obesity (BMI $ 35 kg/m 2 ) was the final GWAS we conducted. It has been determined by the World Health Organization (WHO) that more than 1.9 billion adults are overweight and over 650 million are obese. Moreover, obesity is associated with several other chronic diseases, such as cardiovascular disease, DM2 and cancer, all of which could lead to premature death (Kopelman 2007). Overall, our obesity results consist of many of the same SNPs and genes found to be associated with BMI. However, the obesity results did demonstrate an increase in the genetic associations at the significance level of p,1x10 -5 in and very close to the NEGR1 gene compared to previous BMI associations. Previous studies using genetically modified mice with NEGR1 deficiency or NEGR1-loss-of-function support a role for NEGR1 in the control of body weight; however, the mechanism of its involvement is not clear (Lee et al. 2012). Contradictory with anticipated results, these mutant mice display a small but steady reduction of body mass. Notwithstanding, these studies do Figure 3 PheWAS results between BMI-significant SNPs and EHR Phenotypes. This figure shows the results of individual logistic regressions between incidence of 633 phenotype groups (phecodes) and the genotypes of 27 SNPs found to have statistically significant associations with BMI in a cohort with DM2 patients. Each point represents the p-value of one SNP and one of 633 phecodes with at least 20 cases assigned to it. The horizontal red line represents the significance level p=1.02x10 -4 , and the blue line represents the Bonferroni correction of p=3x10 -6 .
suggest that loss of NEGR1 function in the mouse models has a negative effect on body mass as well as lean mass, supporting the possibility that NEGR1 may contribute to a change body mass. It is also important to note that animal models are a representation of human physiology but not necessarily a precise depiction.
Our extreme obesity vs. non-obese study also identified two new genes, PFKFB3 and KLRB1, that are not yet found to be significantly associated to BMI. The odds ratios associated with all SNPs found in these genes were less than one, indicating that they decreased the odds of extreme obesity risk. These results are potentially supported by work conducted by Huo et al., who reported that mice transgenically modified to selectively overexpress PFKFB3 in adipocytes show increases fat deposition in their adipose tissue (Huo et al. 2012). In contrast, an earlier study reported that transgenic mice with reduced PFKFB3 expression show exacerbated diet-induced insulin resistance (Huo et al. 2010). Kerr et al. showed that inhibition of PFKFB3 mRNA impairs basal and insulin stimulated lipogenesis and also proposed that decreasing the expression of PFKFB3 may inhibit adipocyte lipid storage (Kerr et al. 2019). This would support a hypothesis that polymorphisms leading to a decrease in PFKFB3 expression may be protective from the development of obesity; however, tissue-specific transcriptional studies in humans would be required to fully support this assertion.
We also observed polymorphisms in KLRB1 correlate with a decrease of odds in extreme obesity risk. KLRB1 expression produces a type II transmembrane glycoprotein also known as CD161; a member of the C-type lectin superfamily. CD161 is expressed on the surface of most natural killer (NK) cells and natural killer T (NKT) but also on subsets of peripheral T cells and CD3 + thymocytes. While the biological function of CD161 is not firmly established, it was suggested that it serves either as a stimulatory receptor or to inhibit NK cell-mediated cytotoxicity and cytokine production (Lanier et al. 1994). Indeed, NK cells were shown to be upregulated in the fat of obese twins (Pietiläinen et al. 2008); moreover, BMI and KLRB1 expression may be correlated in that KLRB1 transcription has been reported to increase as BMI increases (Rai et al. 2014). Additionally, CD161 bright CD8 + mucosal associated invariant T (MAIT) cells play a central role in maintaining mucosal immunity and therefore, changes in CD161 expression on these cells may lead to alterations in mucosal immunity and gut microbiota homeostasis. These changes may in turn manifest as alterations of dietary metabolism. It is noteworthy that increases in MAIT cells are associated with Juvenile Type 1 Diabetes and polymorphisms in KLRB1 have been associated with ischemic heart disease (Makeeva et al. 2015), and differential transcription of KLRB1 has been reported in DM2 and coronary artery disease (Gong et al. 2017). Furthermore, another gene, GRIN2A, that is part of the family of genes {GRIN1, GRIN2A, GRIN2B, GRIN2C, GRIN2D, GRIN3A, and GRIN3B}, which encode proteins that form a receptor in charge of sending chemical messages between neurons in the brain, was found to be associated to obesity in adult women defined as metabolically healthy in . (Schlauch et al. 2018).
Associations between FTO and obesity were just under genomewide significance levels. This is a possible indication that FTO polymorphisms cause small changes in BMI, rather than the wide range differences observed between extreme obese cases and controls. Nonetheless, previous research has demonstrated that a combination of several FTO mutations will increase the likelihood of a participant being n■ The comprehensive series of GWASs presented here validates associations of obesity and BMI found in previous studies, such as the FTO and NEGR1 loci (Willer et al. 2009;Speliotes et al. 2010;Okada et al. 2012;Locke et al. 2015). Many larger studies identify associative loci in MC4R (Willer et al. 2009;Speliotes et al. 2010;Okada et al. 2012). While our studies did not detect SNPs in MC4R with genome-wide significance, they did identify associations at p=1x10 -4 and p=1.7x10 -4 of SNPs rs17782313 and rs571312 (Willer et al. 2009;Speliotes et al. 2010). A number of obesity case-control studies have found variations Figure 4 PheWAS results between BMI and EHR Phenotypes. This figure illustrates the results of individual linear regression between incidence of phenotype groups (phecodes) and the continuous BMI metric of all 6,645 individuals. Each of the 301 points represents the p-value of the association between one of 1,523 phecodes with at least 20 cases assigned to it, and BMI. Statistical significance was assessed by using the False Discovery Rate of 0.1, corresponding to a raw p-value of 1.96x10 -2 . Only associations with p , 1x10 -15 are annotated for ease of viewing, represented by a horizontal line at 15 on the y-axis.

Figure 5
PheWAS results between obesity-significant SNPs and EHR Phenotypes. This figure presents results of individual logistic regressions between incidence of 372 phenotype groups (phecodes) and the genotypes of 34 SNPs found to be associated with extreme obesity. Each point represents the p-value of one SNP and one of 372 phecodes with at least 20 cases assigned to it. The horizontal red line represents the significance level p=3.85x10 -4 , and the blue line represents the Bonferroni correction of p=4x10 -6 . in the MC4R gene (Xi et al. 2012;Evans et al. 2014). Our case-control study does reveal that rs17782313 in MC4R associates with obesity at p=3x10 -4 . Our cohort is a controlled, regional population. The next two stages of the Healthy Nevada Project will add between 40,000 (2019) and 150,000 (late 2020) more Nevadans to the current cohort. With these much larger cohort sizes, it is our hope that a stronger associate link with MC4R will be identified.

PheWAS of healthy Nevada BMI and obesity
To the best of our knowledge, this is the first dual-PheWAS targeted at BMI and obesity. Cronin et al. present a comprehensive PheWAS targeted at FTO variants, which also show strong associations with overweight and obesity phenotypes, hypertension and hyperlipidemia (Cronin et al. 2014). Milliard et al. perform a large PheWAS study to examine phenotypic associations with BMI that focus on the nervousness phenotypes: the study identified known associations such as diabetes and hypertension (Millard et al. 2019).
The PheWAS performed on the SNP associations in this study's BMI cohort identified strong associations of elevated PSA levels with variants in FTO, and indicates that the number of minor alleles of these variants is predictive of elevated PSA. This finding is in contradiction to the reports (Bañez et al. 2007;Oh et al. 2013;Zhang et al. 2016;Bonn n■  et al. 2016) indicating an inverse relationship between PSA levels and BMI. However, serum levels of PSA may be elevated due to reasons other than prostatic malignancy. Benign prostatic hyperplasia (BPH), prostatitis (Nadler et al. 1995), ejaculation (Herschman et al. 1997), or manipulation of the prostate gland (Chybowski et al. 1992;Crawford et al. 1992;Tarhan et al. 2005) may cause elevated levels of serum PSA. Our study did not control for such parameters. Our sample includes only 71 individuals with ICD codes indicating high PSA levels, of which a number are morbidly obese. Increased BMI is often associated with increased age and our study population was significantly older than the general median age of the U.S. population. Thus, it is possible that older age contributed to increased likelihood of BPH and, subsequently, elevated serum PSA, negating the reported inverse effect of BMI on PSA levels and possibly exposing a novel association with variants in FTO.
Many of the clinical associations observed in the PheWASs of the HNP in relation to various degrees of increased BMI and the presence of obesity are part of the cluster of clinical conditions associated with metabolic syndrome (Alberti et al. 2009). Obesity is a risk factor for respiratory conditions such as chronic obstructive pulmonary disease (COPD), asthma, obstructive sleep apnea and obesity hypoventilation syndrome, and may influence the development and presentation of these diseases (Poulain et al. 2006). Accumulation of fat tissue impairs ventilatory function in adults (Lazarus et al. 1997) and increased BMI is associated with a reduction in forced expiratory volume in one second (FEV1), forced vital capacity (FVC), total lung capacity, functional residual capacity and expiratory reserve volume (Rubinstein et al. 1990;Chinn et al. 1996;Lazarus et al. 1997;Biring et al. 1999). Peripheral edema has long been recognized as associated with extreme obesity (Alexander et al. 1962). In the U.K. Community Nursing Services study, obesity was found as an independent risk factor for chronic edema (Moffatt et al. 2019).
To the best of our knowledge, this work was the first to conduct simultaneous and comprehensive genome-wide and phenome-wide association studies to validate genetic associations of BMI and severe class 2 obesity. While GWAS/PheWAS studies have been conducted in the past, the present study leveraged a unique EHR database to identify 26 SNPs across six chromosomes associated with extreme obesity. Moreover, while approximately 70% were associated with obesity and BMI in prior studies, 30% are novel with respect to these metrics, suggesting that they may be of special significance. For example, our unique approach allowed us to identify a novel polymorphism associated Figure 6 PheWAS results between extreme obesity and EHR Phenotypes. This figure illustrates the results of individual linear regression between incidence of phenotype groups (phecodes) and the incidence of extreme obesity in 2,996 individuals. Each of the 191 points represents the p-value of the association between one of 1,362 phecodes with at least 20 cases assigned to it, and extreme obesity. Statistical significance was assessed by using the False Discovery Rate of 0.1, corresponding to a raw p-value of 1.4x10 -2 . Only associations with p , 1x10 -15 are annotated for ease of viewing, represented by a horizontal line at 15 on the y-axis.
with extreme obesity in a previously uncharacterized pseudogene that is primarily expressed in the pancreas. Given that the pancreas serves an essential role in converting food into fuel and regulating blood sugar, future studies may support a critical role for these SNPs in extreme obesity cases and may suggest targeted treatment strategies.