Application of a Genetic Risk Score to Racially Diverse Type 1 Diabetes Populations Demonstrates the Need for Diversity in Risk-Modeling

Prior studies identified HLA class-II and 57 additional loci as contributors to genetic susceptibility for type 1 diabetes (T1D). We hypothesized that race and/or ethnicity would be contextually important for evaluating genetic risk markers previously identified from Caucasian/European cohorts. We determined the capacity for a combined genetic risk score (GRS) to discriminate disease-risk subgroups in a racially and ethnically diverse cohort from the southeastern U.S. including 637 T1D patients, 46 at-risk relatives having two or more T1D-related autoantibodies (≥2AAb+), 790 first-degree relatives (≤1AAb+), 68 second-degree relatives (≤1 AAb+), and 405 controls. GRS was higher among Caucasian T1D and at-risk subjects versus ≤ 1AAb+ relatives or controls (P < 0.001). GRS receiver operating characteristic AUC (AUROC) for T1D versus controls was 0.86 (P < 0.001, specificity = 73.9%, sensitivity = 83.3%) among all Caucasian subjects and 0.90 for Hispanic Caucasians (P < 0.001, specificity = 86.5%, sensitivity = 84.4%). Age-at-diagnosis negatively correlated with GRS (P < 0.001) and associated with HLA-DR3/DR4 diplotype. Conversely, GRS was less robust (AUROC = 0.75) and did not correlate with age-of-diagnosis for African Americans. Our findings confirm GRS should be further used in Caucasian populations to assign T1D risk for clinical trials designed for biomarker identification and development of personalized treatment strategies. We also highlight the need to develop a GRS model that accommodates racial diversity.


Results
GRS effectively discerns type 1 diabetes patients and AAb + individuals from controls and relatives within a Caucasian cohort. Until now, type 1 diabetes GRS regression models put forth by Oram et al. and Patel et al. 16, 17 have only been tested and validated in European Caucasian cohorts [15][16][17] . We sought to determine the efficacy of a similar GRS, calculated as previously described 16,17 , in our regional southeastern U.S. cohort comprised of type 1 diabetes patients [n = 637, age (years) median (interquartile range) 15.50 (11.67-19.75)], first-degree relatives [≤1AAb + , n = 790, age 20.75 (11.29-40.42)], second-degree relatives [≤1AAb + , n = 68, age 26.79 (12.33-45.02)], at-risk relatives (≥2AAb + , n = 46, age 15.33 (10.33-33. Higher GRS associates with a younger age at diagnosis in Caucasian subjects. We next addressed whether GRS was associated with type 1 diabetes age of onset. Indeed, among Caucasian type 1 diabetes subjects, we observed a significant negative correlation between GRS and age of diagnosis (Pearson's correlation r = −0.23, P < 0.0001, Fig. 4A). Subjects diagnosed after age 16 had lower GRS than those diagnosed from 8-16 years of age and those diagnosed under age 8 (Fig. 4B), suggesting that a higher GRS may predict earlier disease onset. Prior studies noted the HLA association with earlier onset of disease; however, the contribution of the non-HLA component of risk was not clear [25][26][27] . Our data clearly demonstrated the majority, if not all, of this negative age association was conferred by the HLA risk component. When the non-HLA loci were removed from the GRS calculation, the negative correlation with age (r = −0.25, P < 0.0001) was virtually the same as the full GRS (Fig. 4C,D). Conversely, when HLA was removed from the calculation, no association with age at diagnosis was observed (Fig. 4E,F).
We next sought to determine which HLA diplotypes may be affecting the age at diagnosis. High-risk HLA-DR3-DQ2 (simplified to DR3) and HLA-DR4-DQ8 (simplified to DR4) were imputed and subjects were categorized into six diplotypes in combination with lower-risk HLA (collectively denoted as DRX): DR3/DR4, DR4/DR4, DR3/DR3, DR4/DRX, DR3/DRX, and DRX/DRX. We observed the known contribution of HLA-DR3/ DR4 to earlier clinical onset 28 , as well as a significant difference in age of diagnosis between HLA-DR3/DR4 and HLA-DR4/DR4 subjects (Fig. 5A). Distributions of numbers (Fig. 5B) and percentages (Fig. 5C) of age of diagnosis stacked by HLA risk diplotypes illustrate the skewing of HLA-DR3/DR4 individuals to earlier diagnoses. To quantify this observation, we calculated the proportion of patients diagnosed prior to 8 years of age, from 8-16 years of age, and older than 16 years of age for each of the six HLA categories (Table 1). We found that the proportion of patients with HLA-DR3/DR4 diagnosed before age 8 (44.4%) was 5.6 times greater (P < 0.01) than those diagnosed after age 16 (7.9%), while the proportion for the other five HLA categories diagnosed before age 8 (28.9%) was only 1.5 times greater than those diagnosed after age 16 (19.1%). Conversely, significantly more patients with HLA-DR4/DR4 (P < 0.01) and DRX/DRX (P < 0.05) diplotypes were diagnosed after age 16. Interestingly, HLA-DR3/DR3 patients were more likely to be diagnosed between age 8 and 16 (P < 0.01, Table 1). These results suggest that contribution of high-risk HLA-DR3 and HLA-DR4 haplotypes to age of clinical onset may be more nuanced than previously reported [25][26][27][28] .
Oram and Patel initially used a similar GRS model as a tool to assist in the differential diagnoses of early onset type 2 diabetes and monogenic forms of diabetes from type 1 diabetes 16 Figure 2. The Genetic Risk Score (GRS) can discriminate Caucasian subjects with type 1 diabetes and highrisk relatives from controls and lower-risk relatives. (A) GRS was significantly higher among Caucasian type 1 diabetes patients (T1D, n = 478) and at-risk relatives (n = 35) compared to controls (n = 290), seconddegree relatives (2° Relatives, n = 33), and first-degree relatives (1° Relatives, n = 611). (B) Receiver operating characteristic (ROC) curve shows that the GRS significantly discriminates type 1 diabetes patients from control subjects (T1D vs Controls) with 83.3% sensitivity yielding 73.9% specificity (area under curve (AUC) = 0.8598) and, to a lesser degree, type 1 diabetes patients from first-degree relatives (T1D vs Relatives) with 67.4% sensitivity yielding 65.0% specificity (AUC = 0.7163). (C) Classifying subjects as T1D or control. Peak balanced accuracy was determined to be 78.95% at a GRS of 0.251. (D) Classifying subjects as T1D or relatives. Peak balanced accuracy was 66.70% at a GRS of 0.267. (E) GRS of At-risk subjects (≥2AAb + ) vs age at donation. The 75 th (upper dotted), 50 th (solid), and 25 th (lower dotted) centile lines of the T1D GRS are shown for reference. (F) Comparison of GRS of young (<20 years old) At-risk subjects to aged (>20) At-risk, young first-degree relatives, aged first-degree relatives. Kruskal-Wallis ANOVA with Dunn's posttest *P < 0.05, **P < 0.01, ****P < 0.0001. age at diagnosis (Fig. 4A). Initially, we observed 3 additional type 1 diabetes subjects with exceptionally low GRS that were all AAbat onset (data not shown). Clinical follow-up revealed that two of these subjects (subject 1: age at diagnosis (yrs) = 10, BMI = 28.0, GRS = 0.118; subject 2: age at diagnosis = 15, BMI = 19.3, GRS = 0.153) were undergoing MODY testing (awaiting patient compliance), and the third (age at diagnosis = 16, BMI = 42.0, GRS = 0.184) has been re-diagnosed as having type 2 diabetes. These results support the utility of GRS in aiding in the differential diagnosis of diabetes forms when used in combination with standard clinical assessments and AAb detection.
Current GRS models are less robust for assessing type 1 diabetes risk in U.S. racial minority groups. We next examined the utility of GRS to discriminate type 1 diabetes subjects from controls or relatives within the Asian American and African American (includes Hispanic/Latino African Americans, Supplemental Table 2) subsets of the UFDI cohort. This notion emanates from previous HLA associations in African American type 1 diabetes subjects 29 , in addition to clear alterations in the allele frequencies of racial groups for the putative risk loci reported in the 1000 Genomes project 30 . Similar to Caucasian subjects (Fig. 2), GRS was significantly higher in Asian American type 1 diabetes subjects compared to controls (Fig. S1A). GRS appeared to accurately discriminate type 1 diabetes patients from controls (AUROC = 0.92; P = 0.0002) and from relatives with (AUROC = 0.86; P = 0.04) (Fig. S1B), although this cohort is insufficiently powered to draw conclusive results at a population scale (Supplemental Table 1). Additionally, no multiple AAb + at-risk Asian American subjects were enrolled in this study; hence, there is a need to validate these findings in a larger cohort.
Once again, GRS was significantly higher in African American type 1 diabetes patients (n = 84) compared to controls (n = 63) as well as relatives (n = 118), but the study was not sufficiently powered to detect significant differences from multiple AAb + at-risk African American subjects (n = 6, Fig. 6A; Supplemental Table 1). Within the African American cohort, we found GRS was less robust in discerning type 1 diabetes patients from controls (63.0% sensitivity, 85.3% specificity, AUROC = 0.75) or from first-degree relatives (63.0% sensitivity, 61.5% specificity, AUROC = 0.63) (Fig. 6B). Peak balanced accuracy was 68.98% at GRS = 0.233 for classifying African American subjects as type 1 diabetes patients or controls and 60.30% at GRS = 0.233 for classifying subjects as patients or relatives (Fig. 6C,D). Additionally, the HLA-mediated association between GRS and age of diagnosis observed in Caucasian patients was lost in the African American cohort (Figs S2-S3; Supplemental Table 5). HLA associated with the highest risk in Caucasians were detected in lower frequencies in African Americans, where the three highest risk HLA (HLA-DR3/DR4, -DR4/DR4, and -DR3/DR3) were only detected in African American patients and not in controls ( Fig. S3; Table 2). Importantly, the SNP array utilized herein did not impute the African American-derived HLA haplotypes shown to confer type 1 diabetes risk or protection 29 . Though only modestly powered, several non-HLA alleles tested for GRS did not confer risk in African Americans to the same degree as in Caucasians (Table 2). Notable risk differences were observed for three SNPs tested herein:

Discussion
Focused genetic testing is relatively inexpensive, non-invasive, and may be scaled for population screening efforts. The implementation of such tests may be useful for refining efforts to identify subjects who would benefit from more costly AAb and interventional screening efforts that may need to be repeated over time. Given that type 1 diabetes has known genetic components conferring susceptibility, several models designed to stratify subjects as high-and low-risk have been developed in recent years. One model was recently shown to assist in the differential diagnosis of type 1 diabetes from early-onset type 2 diabetes and from monogenic diabetes 16,17 . We emulated this model to assess its capacity to stratify subgroups (i.e., controls, low-and high-risk relatives, type 1 diabetes patients) using cumulative genetic risk in our regional cross-sectional cohort. While the Oram Table 5). As expected, the GRS was higher in subjects with type 1 diabetes compared to first-degree relatives, second-degree relatives, and controls. Most importantly, the GRS was also significantly higher in relatives at the highest-risk for disease development (≥2 AAb + ) compared lower-risk relatives (≤1 AAb + ); this was the case even in subjects under 20 years of age (0.277 ± 0.03). We note that this current study did not measure anti-insulin autoantibodies, which may have affected subject assignment as ≤1 AAb + relatives and ≥2AAb + at-risk relatives. These data support the notion that genotyping a limited number of selected SNPs allows for the identification of subjects at elevated-risk for developing disease. This notion has important implications for GRS use for subject enrollment into mechanistic and natural history studies of type 1 diabetes. It also highlights potential for large-scale population screening efforts for clinical diagnostics, particularly as per sample genotyping costs decline over time. We acknowledge that our genotyping is by no means comprehensive, and the potential may exist to improve prediction and ROC values as additional validated loci and causative SNPs are defined. This may be particularly true regarding ROC for type 1 diabetes subjects versus relatives sharing an appreciable portion of the genome. Ultimately, long-term longitudinal studies such as TEDDY, DAISY, TrialNet Natural History Study, and BABYDIAB will be most informative for such analyses 14,15,32 .
Genetic screening may not only identify high-risk individuals, but may also indicate appropriate ages to implement other screening regimens, such as AAb testing. We found a significant negative correlation between GRS and age of type 1 diabetes diagnosis, which was nearly completely accounted for by HLA diplotype. While the highest-risk HLA-DR3/DR4 diplotype was associated with the earliest age of diagnosis, as has been previously shown [25][26][27] , diagnosis occurred significantly later and coincided with puberty in subjects carrying the HLA-DR3/ DR3 diplotype. The genes that comprise GRS account for a major proportion of the heritability of type 1 diabetes but explain much less of the variation of the heterogeneity of age of diagnosis. Improvements in this latter capacity may require better powered approaches or GWAS designed to identify genetic associations of type 1 diabetes characteristics (such as diagnosis age 33 and rate of β-cell decline), which may be distinct from the variants associated with disease development.  The SEARCH for Diabetes in Youth Study recently reported that type 1 diabetes prevalence was 1 in 392 for Caucasian Americans under 20 years of age, 1 in 617 for African Americans, and 1 in 1667 for Asian Americans 34,35 . Currently, the vast majority of type 1 diabetes genetics studies are limited to Caucasian cohorts. However, the figures above imply that for African Americans, type 1 diabetes prevalence is almost 2/3 that of Caucasian Americans, while for Asian Americans it is almost 1/4. Thus, closer examinations of type 1 diabetes genetics within these underrepresented racial minorities in the U.S. must be performed. A limitation of the current study was our reliance on self-reported race and ethnicity, as our SNP array lacked informative ancestral markers commonly utilized in high density genome-wide arrays for imputing and assigning race and ethnicity. Nevertheless, our analysis indicated that GRS could discriminate type 1 diabetes subjects from controls in a small cohort of subjects identifying as Asian American, but larger studies are need to validate and extend these findings. For African Americans however, GRS was less effective in discerning type 1 diabetes subjects from controls, and the association between a higher GRS with early disease onset was lost. These observations are likely related to known differences in HLA-conferred disease risk or protection in the context of race 27,29,36 . Thus, GRS models suitable for African Americans would likely need to impute these haplotypes. Additionally, the current set of type 1 diabetes risk loci, which were identified in predominantly Caucasian cohorts, may be less effective for assessing risk in non-Caucasian individuals. These findings underscore the need to perform type 1 diabetes incidence studies and GWAS in non-Caucasian groups, enabling development of a GRS model that accounts for heterogeneity in populations.
Type 1 diabetes risk-loci that significantly predict disease in African Americans but not Caucasian Americans may underlie pathophysiologic differences in disease processes between races and may explain how African Americans accrue risk without the classical high-risk HLA types originally defined in Caucasians. Here we found that risk variants of two genes, SH2B3 and GAB3, were more predictive of type 1 diabetes in African American subjects. Interestingly, both of these genes encode proteins that affect myeloid cell development and activation. SH2B3 encodes the protein LNK, which modules cytokine signaling in myeloid cells via the signaling adapter, JAK2 [37][38][39][40][41] . Indeed, the risk variant of SH2B3/Lnk (rs3184504) is associated with altered expression of key elements of IFNγ signaling 42 . Thus, it is likely that SH2B3/Lnk variants modulate myeloid innate immune cells through altered sensitivity to various cytokines. The GAB3 protein product interacts with the M-CSF receptor and drives macrophage differentiation 43 . How the risk variant of GAB3 affects this process remains unknown. As these functional studies advance, it will be critical that investigators consider the race of study subjects as well as the presence of additional gene variations that may affect the same cells/pathways.
An important aspect of our study assessed the effect of Hispanic/Latino ethnicity within southeastern U.S. Caucasians on GRS. We found that the GRS robustly discriminates type 1 diabetes patients from controls in Hispanic/Latino Caucasian to the same degree as non-Hispanic/Latino Caucasian cohorts. The prevalence of type 1 diabetes in Hispanic/Latino American youth is roughly half that of non-Hispanic/Latino Caucasians 34 , yet is increasing at a greater annual rate (4.2%) versus non-Hispanic/Latino Caucasian populations (1.2%) 44 . Given the concurrent increase in type 2 diabetes in non-Caucasian American youth 44 , our findings may have significant implications for utilization of GRS in both research and clinical settings in these understudied populations.
Although our cross-sectional cohort does not include routine follow-up, we were able to utilize GRS to identify type 1 diabetes subjects whose diagnoses were questionable or have been changed subsequent to their enrollment, which further demonstrates the clinical utility of GRS as a tool to improve differential diagnoses of type 1 diabetes from early-onset type 2 and monogenic diabetes 16,17 . This may justify the use of GRS as a screening tool at diagnosis in order to promote the concept of precision medicine when determining which therapies may be best suited for a particular patient. Notwithstanding, since GRS only modestly discerns type 1 diabetes patients from first-degree relatives, there may be a capacity to improve the current model. The log-additive model for non-HLA risk may not be the most accurate method for computing GRS, possibly resulting in decreased specificity or loss of age-associated non-HLA risk 45 . Additionally, there may be more comprehensive methods to capture all HLA-associated type 1 diabetes-risk with more HLA variants (reviewed in 1 ). Since several loci contain genes that are predicted to confer overlapping functional effects (e.g., CD25, IL2, PTPN2 in the IL-2 signaling pathway), one may expect a GRS model to include computations that account for such genetic synergies. However, this level of genetic risk modeling remains elusive, as Winkler et al. were unable to identify genetic interactions using a more extensive genotyping panel on a much larger cohort 15 . Moreover, models using genetics alone are not expected to predict type 1 diabetes with 100% accuracy since environmental, epigenetic, and stochastic factors (e.g., immunoreceptor V(D)J gene recombination) are also thought to impact overall risk. Perhaps even more confounding is the notion that genetic and environmental risk interactions may not be static phenomena. This may be most evident by the concomitant trends of decreasing proportion of high risk HLA in type 1 diabetes patients and increasing overall type 1 diabetes prevalence 46,47 . All of these aforementioned factors that are missing from this GRS model may contribute an unknown amount of bias negatively impacting GRS selectivity. Ultimately, improved accuracy of diabetes prediction models will likely require a better understanding of epistatic genetic and environmental risk interactions.
The results of this and other studies imply GRS could represent a low-cost means to assist in general population screening to identify patients who have increased risk of developing type 1 diabetes. We therefore envision the utilization of GRS to guide future trial recruitment and cohort stratification efforts. These observations strengthen the argument for focused genetic screening to monitor progression in the clinic, improve functional studies, facilitate biomarker identification, and optimize subject selection for interventional and natural history trials.

Methods
Subject enrollment and sample collection. Informed  (IRB #201400709). All experiments were performed in accordance with relevant guidelines and regulations. Genomic DNA and serum samples were collected and stored at −20 °C from 1,946 research participants together termed the University of Florida Diabetes Institute (UFDI) cohort. This collection included control subjects [type 1 diabetes-unaffected and non-first-or -second-degree relatives of type 1 diabetes patients] (n = 405), first-degree relatives with ≤1 type 1 diabetes-relevant autoantibody (AAb) (n = 790), second-degree relatives with ≤1 AAb (n = 68), multiple AAb positive at-risk relatives (≥2AAb + , n = 46), and type 1 diabetes patients (n = 637). Type 1 diabetes status was assigned according to clinician diagnosis. Subjects self-reported race as Caucasian/White, African American/Black, Asian, Pacific Islander/Hawaiian, Native American/Alaskan, or Multiple/Other and separately indicated ethnicity as Hispanic/Latino or non-Hispanic/Latino ( Supplemental Tables 1 and 2; Fig. 1A). The geometric mean ± SD for the type 1 diabetes diagnosis age was 8.94 ± 2.18 years (Fig. 1B).
AAb measurement. AAbs against type 1 diabetes-related autoantigens [i.e., glutamic acid decarboxylase (GAD), insulinoma-associated protein 2 (IA-2), and zinc transporter 8 (ZnT8)] were measured from serum samples via ELISA kits (KRONUS Inc., Star, ID) according to the manufacturer's instructions 48 . DNA preparation. DNA was prepared via QiaCube high-throughput nucleic acid purification system according to manufacturer's recommendations (Qiagen, Hilden, Germany). Purified DNA was genotyped on either the custom array or manually, as described below. Samples missing HLA SNP calls or with <90% of non-HLA risk measured were excluded. The OR of the type 1 diabetes-risk alleles were derived from Immunobase.org (Supplemental Table 3, Fig. 1D).
Single nucleotide polymorphism (SNP) selection and genotyping. The HLA-DR and HLA-DQ region plus additional loci with known associations for type 1 diabetes-risk 2 were considered for inclusion in a custom Taqman SNP genotyping array (ThermoFisher, Carlsbad, CA). Since the list of risk loci changes as more GWAS and meta-analyses are completed, the loci in this study are limited to those that are curated on immunobase.org as of October 2017. SNP assays passed quality control (QC) when they generated >95% successful call rates and <5% intra-sample discordance. SNPs that failed QC were excluded. Some key SNPs that either failed QC on the array or were not included on the array (rs2187668, rs7454108, rs3129889, rs1264813, rs2395029, and rs2292239) were manually genotyped using validated Taqman assays (ThermoFisher, Carlsbad, CA) 16 . 32 SNPs passed QC (Supplemental Table 3). The Taqman genotyping array and individual taqman assays were performed according to manufacturer instructions. where β is the natural log of the OR and s is the number of risk alleles (0, 1, or 2) carried for SNP i of n loci tested. Chromosome X SNPs in male subjects were counted as 0 or 2, which assumes a dominant risk effect in the hemizygous state. H l is the HLA diplotype risk for combinations of DR3-DQ2, DR4-DQ8, and X. The summed risk was then divided by the number of alleles tested. This method used identical SNP imputing for class I and class II HLA as Oram et al. and Patel et al., and a partially overlapping set SNPs to compute non-HLA risk (compare Supplemental Table 3 to Oram et al. 16 ).
Statistics. Data were graphed and analyses performed using GraphPad Prism software version 7 (San Diego, CA). Data are presented as ROC curve with AUC, as Tukey box and whisker plots or mean ± SD bar graphs compared via Kruskal-Wallis with Dunn's multiple comparisons testing, scatter plots with linear regression and Pearson Correlation, or in tabular form and compared via Fisher's exact test. Fisher's exact test was performed using the Scipy package (version 0.18.1, https://scipy.org/) in Python3. Balanced accuracy was calculated for thresholds across the GRS range as [(predicted T1D/actual T1D) + (predicted non-T1D/actual non-T1D)]/2. Significance was defined as P < 0.05.