Damaging missense variants in IGF1R implicate a role for IGF-1 resistance in the etiology of type 2 diabetes

Summary Type 2 diabetes (T2D) is a heritable metabolic disorder. While population studies have identified hundreds of common genetic variants associated with T2D, the role of rare (frequency < 0.1%) protein-coding variation is less clear. We performed exome sequence analysis in 418,436 (n = 32,374 T2D cases) individuals in the UK Biobank. We identified previously reported genes (GCK, GIGYF1, HNF1A) in addition to missense variants in ZEB2 (n = 31 carriers; odds ratio [OR] = 5.5 [95% confidence interval = 2.5–12.0]; p = 6.4 × 10−7), MLXIPL (n = 245; OR = 2.3 [1.6–3.2]; p = 3.2 × 10−7), and IGF1R (n = 394; OR = 2.4 [1.8–3.2]; p = 1.3 × 10−10). Carriers of damaging missense variants within IGF1R were also shorter (−2.2 cm [−1.8 to –2.7]; p = 1.2 × 10−19) and had higher circulating insulin-like growth factor-1 (IGF-1) protein levels (2.3 nmol/L [1.7–2.9]; p = 2.8 × 10−14), indicating relative IGF-1 resistance. A likely causal role of IGF-1 resistance was supported by Mendelian randomization analyses using common variants. These results increase understanding of the genetic architecture of T2D and highlight the growth hormone/IGF-1 axis as a potential therapeutic target.


In brief
Gardner et al. queried the genomes of over 400,000 individuals and identified novel genes associated with type 2 diabetes risk. The biological function of these genes highlights potentially new therapeutic avenues for treatment of type 2 diabetes.

INTRODUCTION
Type 2 diabetes (T2D) is a complex disease characterized by insulin resistance and b-cell dysfunction. An estimated 630 million adults are expected to have T2D by 2045, 1 making it one of the fastest growing global health challenges of the 21st century. Genome-wide association studies (GWASs) have successfully identified more than 500 genomic loci to be associated with T2D, 2 although the majority of these are driven by common variants with small individual effects on T2D risk.
Over 90% of GWAS loci lie in non-coding regions of the genome, 3 presenting a major hurdle for the identification of the underlying causal genes and the translation of these findings into mechanistic insight. In contrast, analysis of rare proteincoding variation captured by DNA sequencing has the potential to more directly implicate individual genes and biological mechanisms. The UK Biobank (UKBB) 4 study recently made exome sequencing (ES) data available for 454,787 UKBB participants. 5 This offers an unprecedented opportunity to explore the contri-bution of rare coding variations to the risk of T2D with much greater power than previously possible. [6][7][8] Initial exome-wide association analyses of these data have identified gene-based associations with increased risk of T2D for GCK, HNF1A, HNF4A, GIGYF1, CCAR2, TNRC6B, and PAM and protective effects for variants in FAM234A and MAP3K15. 5,[9][10][11][12][13][14] In this study, we combined multiple sources of health record data to identify additional T2D cases and used an extended range of variant classes and allele frequency cutoffs in order to directly implicate novel genes in the etiology of T2D. Our results highlight a number of previously missed associations and support a role for insulin-like growth factor 1 (IGF-1) resistance in the pathogenesis of T2D.

RESULTS
Exome-wide burden testing in the UKBB To identify genes associated with T2D risk, we performed an exome-wide association study (ExWAS) using ES data derived from 418,436 European genetic ancestry UKBB participants. 5 As our primary outcome, we identified 32,374 (7.7%) participants with likely incident or prevalent T2D using phenotype curation that integrated multiple data sources, including hospital episode statistics, self-reported conditions, death records, and use of T2D medication (see STAR Methods).
Individual gene burden tests were performed by collapsing genetic variants across 18,691 protein-coding genes in the human genome. We tested four functional categories across two population prevalences (minor allele frequency <0.1% and singletons), including high-confidence protein-truncating variants (PTVs), missense variants stratified by two REVEL score thresholds, 15 and synonymous variants as a negative control ( Figure 1; STAR Methods). We identified 13 gene-functional annotation pairs with 30 or more rare allele carriers, representing 7 nonredundant genes, associated with T2D at exome-wide statistical significance (p < 6.9 3 10 À7 ; Table S1; STAR Methods). Our results are statistically well calibrated, as indicated by both low exome-wide inflation scores (e.g., PTV l = 1.047) and the absence of significant associations with synonymous variant burden ( Figures 1B-1E and S1). To ensure our results were not biased by our approach, we implemented burden tests using both STAAR 16 and a logistic model and arrived at substantially similar conclusions ( Figure S2; Table S1; STAR Methods).
We confirmed the T2D associations at all three genes identified by three previous studies of T2D risk that incorporated European genetic ancestry individuals from the UKBB study: 11,12,14 Figure 2). As in these previous studies, we similarly found that carriers of PTVs within these genes had substantially increased risk for developing T2D ( Figure 1A). In support of other recent studies, 18 we note surprisingly low penetrance rates for GCK (77% of PTV carriers with T2D) and HNF1A (48%) for effects that are clinically considered to be highly/completely penetrant for monogenic forms of maturity-onset diabetes of the young (MODY ; Table S1). Future studies will be needed to better estimate the true penetrance, which will lie somewhere between the likely underestimates obtained from healthy population studies such as the UKBB 19 and the inflated estimates obtained from clinical genetics referrals.
We also confirmed the T2D association at TNRC6B (n = 35; OR = 10.5 [5.3-21.0]; p = 1.4 3 10 À14 ), which was previously reported as ''potentially spuriously associated'' with T2D risk; 11 several additional lines of evidence provide confidence in this association. Firstly, our result is not attributable to a single variant of large effect as evidenced by the strength of association with singleton variants ( Figure 1A). Secondly, aside from a single individual carrying two balanced deletions, inspection of the underlying ES reads did not reveal a markedly increased error rate in variant calling or genotyping in TNRC6B as was suggested by Deaton et al. 11 Thirdly, the association persisted after excluding 14 individuals who carry a singleton PTV in a potentially nonconstitutive exon as measured by the proportion expressed across transcripts score (p = 3.6 3 10 À7 ). 20 Finally, we also found that TNRC6B PTV carriers had elevated HbA1c levels when considering both T2D cases (4.1 mmol/mol [2.5-5.7]; p = 7.2 3 10 À7 ; Figure S3) and controls (1.6 mmol/mol [0.2-2.1]; p = 1.8 3 10 À2 ), consistent with the elevated long-term blood glucose levels observed in individuals with T2D.
We also identified three additional genes that, when disrupted by rare genetic variation (minor allele frequency <0.1% or singletons), are associated with increased T2D risk:  Figure 1). Unlike previously reported genes outlined above, damaging missense variants, but not PTVs, in these genes were associated with T2D risk (Figure 2). Indeed, in these genes, T2D associations were apparent only with missense variants with high REVEL scores (R0.7) or those variants considered to be the most damaging as per current (2020) Association for Clinical Genomic Science guidelines.
Exploring common variant associations at highlighted genes We next attempted to cross-validate the rare-variant associations for all seven exome-wide significant genes by identifying proximal common variants (±50 kb of a gene's coding sequence) previously reported to be associated with related glycemic or metabolic phenotypes (STAR Methods; Table S2). Four genes fell within glycemic trait-associated loci, and all seven overlapped known metabolic trait associations. For several of these common variant-phenotype combinations, we also identified an association with rare variant burden ( Figure S3). Additionally, three of the four novel genes we report here were identified in the most recent publicly available T2D GWAS 2 as being either the closest or most likely causal gene for a common variant genome-wide significant signal: IGF1R, TNRC6B, and ZEB2 (Table S2).

. Relationship between cumulative minor allele frequency and odds ratio for T2D
Plotted is T2D risk as quantified by odds ratio versus cumulative minor allele frequencies (cMAFs) for genes significantly associated with T2D risk (n = 418,436 participants). For each gene, only the most significantly associated variant mask is shown. Error bars indicate 95% confidence intervals.
in moderate linkage disequilibrium in European populations (R 2 = 75.5%), 22 and are expression quantitative trait loci (eQTLs) for IGF1R. 23 Furthermore, for both SNPs, the IGF1R expression-lowering alleles are associated with higher levels of circulating IGF-1 (p = 7 3 10 À7 and 9 3 10 À7 , respectively; Figure 3) 24 and with higher T2D risk and fasting glucose, although colocalization analyses could not confirm that these effects were driven by the same signals (PP3 = 1, PP4 = 2 3 10 À6 ; Figure 3), likely due to the presence of multiple independent signals. This result, in light of our rare variant association, highlights the limitations of reliance on common variant approaches.
To explore how damaging missense variants disrupt IGF1R function, we next categorized variants by protein domain. Carriers of qualifying variants within the IGF1R protein kinase (residues 999-1274) 25 had a higher risk for T2D (n = 179; OR = 3.4 [2.3-4.9]; p = 1.9 3 10 À10 ) than those with qualifying variants outside this domain (n = 215; OR = 1.7 [1.2-2.6]; p = 8.2 3 10 À3 ). This difference is not statistically significant in the current sample (heterogeneity p = 0.40); however, we speculate that dysfunction within the protein kinase domain might decrease downstream signal transduction resulting in IGF-1 resistance. This may also explain why, despite the relatively large number of IGF1R (n = 64) PTV carriers in the UKBB, we did not find that IGF1R PTV carriers had increased T2D risk. When bound by IGF-1 and to induce downstream signal transduction, IGF1R functions as a homo-or heterodimer (i.e., with INSR as a hybrid receptor). 26 As half of a missense carrier's IGF1R molecules will contain errors in the protein kinase domain, dimerization will incorporate at least one defective molecule 75% of the time and therefore lead to reduced downstream signal transduction. In the case of PTV carriers, since one copy is likely missing due to nonsense-mediated decay, dimerization will always Cell Genomics 2, 100208, December 14, 2022 3 Article ll OPEN ACCESS incorporate two functional copies. Therefore, the association of damaging IGF1R missense variants with T2D may be due to a dominant-negative effect rather than decreased protein abundance; however, additional functional studies are ultimately required to confirm the mechanism underlying these variants.
To explore whether rare variants in other components of the growth hormone (GH)-IGF1 hormone pathway might influence T2D risk, we next identified a further nine genes in the GH-IGF1 pathway that showed gene-burden associations with circulating IGF-1 levels in any of our burden tests (Table S3), including seven genes with known roles in regulating GH secretion or GH signaling and three genes with known roles in IGF-1 bioavailability. We tested their associations with childhood and adult height to indicate the functional relevance of rare variations in these genes. None of the seven GH-related genes showed any association with T2D. Rare damaging variants in IGFALS, which encodes a component of the IGF-1 ternary complex, lowered circulating IGF-1 and were nominally associated with shorter childhood height (indicative of lower IGF-1 bioactivity) and higher risk of T2D. Rare damaging variants in IGFBP3 (the major IGF binding protein), which lowered circulating IGF-1 levels, were nominally associated with taller childhood height (indicative of higher IGF-1 bioactivity) and lower risk of T2D. Hence, damaging rare variants that disrupt IGF-1 bioactivity, but not those that primarily alter GH secretion or signaling, appear to increase T2D risk.
Causality of IGF-1 levels with T2D risk A previous phenotypic observational study described a protective association between baseline circulating IGF-1 protein levels and incident T2D. 27 However, subsequent similar studies found no such association, 28,29 and conversely, a previous study that modeled common genetic variants in a Mendelian randomization framework inferred an adverse causal effect of higher circulating IGF-1 levels on T2D. 30 To explore this apparent inconsistency, we examined the likely causal role of IGF-1 on T2D by modeling 784 independent common genetic signals for circulating IGF-1 levels identified in 428,525 White European UKBB individuals 24 and summary sta-tistics from the largest reported GWAS meta-analysis of T2D. 31 We confirmed the previously reported 30 association between genetically predicted higher IGF-1 levels and higher risk of T2D in inverse-variance weighted (IVW; OR = 1.105 per SD [95% CI 1.039-1.170]; p = 2.9 3 10 À3 ) and sensitivity models (Table S4). However, we noted substantial heterogeneity in the relationships between individual IGF-1 signals and T2D (I-square = 85.7%) as well as in their associations with adult height (IVW beta = 0.142; p = 8.9 3 10 À9 ; I-square = 97.7%). Among the common genetic instruments for higher circulating IGF-1 levels, individual variants at the IGF1 locus (rs11111274) and the IGF1R locus (rs1815009) show directionally opposite effects on childhood height and T2D (taller height and lower T2D risk for IGF1; shorter height and higher T2D risk for IGF1R; Figure S4). Hence, reported common variant instruments for higher IGF-1 levels comprise a mixture of functionally opposing signals, i.e., higher levels of bioactive IGF-1 or higher IGF-1 resistance.

DISCUSSION
Here, we present the results of an ExWAS to assess the contribution of rare variant burden to T2D risk (Figure 1). We identified three genes previously reported by a recent analysis of the UKBB (GCK, HNF1A, and GIGYF1), 14 provided stronger evidence for a previously nominally associated gene (TNRC6B), 11 and identified three new genes (ZEB2, MLXIPL, and IGF1R) where rare variants increase susceptibility to T2D (Figure 2). Using publicly available data, we showed that common variation nearby these genes is associated with a wide range of glycemic and metabolic traits ( Figure 3; Table S2), 2,21 providing further support for these rare variant associations. We further interrogated rare and common variant associations to show that disruption of IGF1R due to damaging missense variants in the cytoplasmic protein kinase domain leads to IGF-1 resistance and higher T2D risk. Overall, our results implicate a wider protective effect of IGF-1 bioactivity on susceptibility to T2D.
While our results are complementary to previous ExWAS, 11,14 we clarified evidence linking TNRC6B to T2D and identified three additional genes missed by previous analyses of the UKBB. A key advantage of our approach was to carefully curate multiple data sources to identify and validate T2D cases. Furthermore, we used a different genetic analytical approach from those previous studies. Nag et al. 14 limited their burden testing to either PTVs, the findings of which we replicate here, or to missense variants with comparatively low deleteriousness scores (REVEL >0.25 or missense tolerance ratio intragenic percentiles %50%). In this study, we have shown the benefit of considering missense variants computationally predicted to be severely damaging (REVEL R0.5 and 0.7). 15 While such variants are much rarer in the population-only $8% of missense variants in UKBB have REVEL scores R0.7-they are much more likely to disrupt protein function and thus increase risk for disease. These conclusions are similar to those shown previously for anthropometric traits, 10 which have shown a relationship between PTVs in IGF1R and several growth measures, but not for damaging missense variants.
A key finding of our work is the association between IGF1R and T2D risk. Loss-of-function mutations in IGF1R have been reported in children presenting with intrauterine growth restriction, short stature, and elevated IGF-1 levels. [32][33][34] Our findings of rare damaging variants at IGF1R, and also at IGFALS and IGFBP3, indicate that reduced IGF-1 bioactivity and signaling increases risk for T2D. There are several plausible mechanisms to link IGF1R to T2D. IGF1R, responding to both systemic and locally generated IGF-1, plays a role in the development of several tissues central to the control of glucose metabolism including pancreatic islets, adipose tissue, and skeletal muscle. 35 An alternative explanation involves the complex relationship between GH and IGF-1. GH, produced in a highly controlled and pulsatile manner from the somatotropes of the anterior pituitary, is the major stimulus to the hepatic expression and secretion of IGF-1, the major source of this circulating hormone. GH also has metabolic effects that are independent of IGF-1, largely exerted by its powerful lipolytic effects in adipose tissue, [36][37][38][39][40][41] which, if uncontrolled, can lead to the accumulation of ectopic lipid in non-adipose tissue, resulting in insulin resistance. This is elegantly demonstrated by studies in mice in which IGF-1 is selectively deleted in the liver. 42,43 These mice show a striking increase in circulating GH levels accompanied by marked insulin resistance, which is entirely abrogated by the blockade of GH signaling. This model can explain the insulin resistance and frequent T2D seen in conditions such as acromegaly, where GH levels are persistently raised due to a functional somatotrope tumor, 44 and the striking protection from T2D seen in patients with Laron dwarfism, whose markedly reduced circulating IGF-1 levels are due to biallelic loss-of-function (LoF) mutations in the GH receptor. 45 LoF mutations in IGF1R are likely to result in compensatory increases in GH secretion and, consequently, the higher levels of circulating IGF-1 that we observed in the carriers of such mutations. While this may partially compensate for impairment in IGF1R function, the lipolytic effects of GH are likely to have a deleterious effect on systemic glucose metabolism. Of note in this regard, a single human proband with a homozygous LoF mutation in IGF1 had elevated circulating GH and severe insulin resistance. 46,47 Therapy with exogenous IGF-1 resulted in sup-pression of GH and a dose-dependent improvement in insulin sensitivity. 47 Accordingly, genetic variants that primarily reduce GH secretion and signaling would lead to reduced IGF-1 bioactivity but without the consequent effects of elevated GH on fatty acid metabolism and insulin resistance and, hence, no alteration in T2D risk. We propose that currently available drugs that reduce GH secretion or block its action may have metabolic benefits in patients with T2D and damaging missense variants in the protein kinase domain of IGF1R.
Our findings also demonstrate the challenge of interpreting Mendelian randomization results of circulating biomarkers. Elevated biomarker levels may reflect higher levels of secretion and biomarker activity or can be increased by mechanisms that reduce biomarker bioavailability or sensitivity. Hence, genetic instruments for higher biomarker levels may comprise a mixture of markers for higher or lower biomarker activity. To distinguish these actions, we suggest that individual common variants are first tested for association with some indicator of biomarker activity (i.e., childhood height as an indicator of IGF-1 activity).
Our rare variant analysis also implicates MLXIPL as a T2D susceptibility gene for the first time. MLXIPL encodes the carbohydrate response element-binding protein (CHREBP), a transcription factor that acts in concert with its obligate binding partner MLX to regulate the cellular response to carbohydrate [48][49][50] and is highly expressed in liver, fat, and muscle. Global or tissue-specific ablation of MLXIPL in mice impairs insulin sensitivity. [51][52][53][54] Common variants at the MLXIPL locus associate with SHBG, a biomarker of insulin sensitivity 55 and with serum triglycerides. 56,57 Notably, MLXIPL is one of the 26-28 genes deleted in Williams syndrome, the result of a deletion of contiguous genes on chromosome 7q11.23. Patients with this syndrome are characterized by marked insulin resistance and an increased risk of diabetes. 58 It seems likely that haploinsufficiency for MXLIPL contributes to the metabolic disturbances characteristic of Williams syndrome.
We acknowledge several limitations of our study. Independent replication was restricted by the limited availability of similar large whole-exome sequencing (WES) studies, although common variant associations at IGF1R, TNRC6B, and ZEB2 provide some confirmation that these genes contribute to T2D etiology. We relied on in silico predictions of the functional consequences of rare variants, and future work is required to experimentally characterize their impacts on protein function and specifically to test the hypothesized dominant-negative effect of missense variants in IGF1R. Finally, it is recognized that the UKBB sample is affected by ''healthy volunteer bias,'' 59 and this may potentially attenuate true disease associations.
Overall, our findings suggest that deeper interrogation of multiple variant types when performing ExWAS can and will lead to the discovery of additional genes associated with a wide range of human diseases.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

Lead contact
Further information and requests for resources should be directed and will be fulfilled by the lead contact, John R. B. Perry (john. perry@mrc-epid.cam.ac.uk).

Materials availability
Rare variant burden testing summary statistics are included in the supplemental information of this paper. Protected UK Biobank participant data will be returned to the UK Biobank resource and be accessible via application number 9905.

Data and code availability
Code used to perform analyses and generate figures and tables for this manuscript is available on github at the following repository: https://github.com/eugenegardner/T2D_IGF1R; an unchanging version of this code at the time of publication is also available on Zenodo: https://doi.org/10.5281/zenodo.7057670. Derived variables and summary statistics generated as part of this publication have been returned to the UK Biobank and will be made available via their returns catalog under application 9905.

METHOD DETAILS
UK biobank data processing and quality control To conduct rare variant burden analyses outlined in this publication, we queried ES data for 454,787 individuals provided by the UKBB study. 5 Individuals were excluded from further analysis if they had excess heterozygosity, autosomal variant missingness on genotyping arrays R5%, or were not included in the subset of phased samples as defined in Bycroft et al. 60  Article ll OPEN ACCESS all study participants who were not of broadly European genetic ancestry, leaving a total of 420,162 individuals for further analysis. Additional individuals were subsequently excluded when assessing specific phenotypes due to the incomplete nature of survey data for somephenotypes (Table S5).
To perform variant quality control and annotation, we utilised the UKBB Research Analysis Platform (RAP; https://ukbiobank. dnanexus.com/). The RAP is a cloud-based compute environment which provides a central data repository for UKBB ES and phenotypic data. Using bespoke applets designed for the RAP, we performed additional quality control of ES data beyond that already documented in Backman et al. 5 Using provided population-level Variant Call Format (VCF) files, we first split and left-corrected multi-allelic variants into separate alleles using 'bcftools norm'. 61 Next, we performed genotype-level filtering using 'bcftools filter' separately for Single Nucleotide Variants (SNVs) and Insertions/Deletions (InDels) using a missingness-based approach. With this approach, SNV genotypes with depth <7 and genotype quality <20 or InDel genotypes with a depth <10 and genotype quality <20 were set to missing (i.e. ./.). We further tested for an expected alternate allele contribution of 50% for heterozygous SNVs using a binomial test; SNV genotypes with a binomial test p. value % 1 3 10 À3 were set to missing. Following genotype-level filtering we recalculated the proportion of individuals with a missing genotype for each variant and filtered all variants with a missingness value >50%.
We next annotated variants using the ENSEMBL Variant Effect Predictor (VEP) v104 62 with the '-everything' flag and plugins for REVEL, 15 CADD, 63 and LOFTEE 64 enabled. For each variant, we prioritised a single ENSEMBL transcript based on whether or not the annotated transcript was protein-coding, MANE select v0.97, or the VEP Canonical transcript, respectively. Individual consequence for each variant was prioritised based on severity as defined by VEP. Following annotation, we grouped stop gained, frameshift, splice acceptor, and splice donor variants into a single Protein Truncating Variant (PTV) category. Missense and synonymous variant consequences are identical to those defined by VEP. Only autosomal or chrX variants within ENSEMBL protein-coding transcripts and within transcripts included on the UKBB ES assay were retained for subsequent burden testing.

Exome-wide association analyses in the UK biobank
To perform rare variant burden tests using filtered and annotated ES data, we employed a custom implementation of BOLT-LMM v2.3.6 17 for the RAP. BOLT-LMM expects two primary inputs: i) a set of genotypes with minor allele count >100 derived from genotyping arrays to construct a null model and ii) a larger set of imputed variants to perform association tests. For the former, we queried genotyping data available on the RAP and restricted to an identical set of individuals used for rare variant association tests. For the latter, and as BOLT-LMM expects imputed genotyping data as input rather than per-gene carrier status, we created dummy genotype files where each variant represents one gene and individuals with a qualifying variant within that gene are coded as heterozygous, regardless of the number of variants that individual has in that gene. To test a range of variant annotation categories across the allele frequency spectrum, we created dummy genotype files for minor allele frequency <0.1% and singleton high confidence PTVs as defined by LOFTEE, missense variants with REVEL R0.5, missense variants with REVEL R0.7, and synonymous variants. For each phenotype tested, BOLT-LMM was then run with default parameters other than the inclusion of the 'lmmInfOnly' flag. When exploring the role of rare variants in the IGF-1/GH axis and to incorporate less deleterious missense variants, we also used an additional set of variant annotations which combined missense variants with CADD R25 and high confidence PTVs (i.e. Damaging; Table S3). To derive association statistics for individual variants, we also provided all 26,657,229 individual markers regardless of filtering status as input to BOLT-LMM. All tested phenotypes were run as continuous traits corrected by age, age, 2 sex, the first ten genetic principal components as calculated in Bycroft et al., 60 and study participant ES batch as a categorical covariate (either 50 k, 200 k, or 450 k). For phenotype definitions used in this study, including our T2D definition adapted from Eastwood et al., 65 please refer to Table S5. Only the first instance (initial visit) was used for generating all phenotype definitions unless specifically noted in Table S5.
To provide an orthogonal approach to validate our BOLT-LMM results, we also performed per-gene burden tests with STAAR 16 and a generalised linear model as implemented in the python package 'statsmodels'. 66 To run STAAR, we created a custom Python and R workflow on the RAP. VCF files were first converted into a sparse matrix suitable for use with the R package 'Matrix' using 'bcftools query'. Using the 'STAAR' R package, we first ran a null model with identical coefficients to BOLT-LMM and a sparse relatedness matrix with a relatedness coefficient cutoff of 0.125 as described by Bycroft et al. 60 We next used the function 'STAAR' to test all protein-coding transcripts as outlined above. To run generalised linear models, we used a three step process. First, we ran a null model with all dependent variables as continuous traits, corrected for control covariates identical to those included in BOLT-LMM. Next, using the residuals of this null model, we performed initial regressions on carrier status to obtain a preliminary p. value. Finally, for individual genes that passed a lenient p. value threshold of <1 3 10 À4 , we recalculated a full model to obtain exact test statistics with family set to 'binomial' or 'gaussian' if the trait was binary or continuous, respectively. Generalised linear models utilised identical input to BOLT-LMM converted to a sparse matrix.
To conduct heterogeneity tests between two calculated odds ratios, we implemented a Z test as sample sizes were sufficiently large. 67 This approach was used to determine if missense variants within the IGF1R protein kinase domain have a stronger effect on T2D risk than those elsewhere in IGF1R. A Z-score was calculated by dividing the difference in the coefficients by the standard error of the difference, as shown below: Z = (Beta from model 1) -(Beta from model 2)/((SE of Beta from model 1) 2 + (SE of Beta from model 2) 2 ) 1/2 P-values were estimated for Z scores from a normal distribution.