Genome-wide aggregated trans-effects on risk of type 1 diabetes: A test of the “omnigenic” sparse effector hypothesis of complex trait genetics

Summary The “omnigenic” hypothesis postulates that the polygenic effects of common SNPs on a typical complex trait are mediated through trans-effects on expression of a relatively sparse set of effector (“core”) genes. We tested this hypothesis in a study of 4,964 cases of type 1 diabetes (T1D) and 7,497 controls by using summary statistics to calculate aggregated (excluding the HLA region) trans-scores for gene expression in blood. From associations of T1D with aggregated trans-scores, nine putative core genes were identified, of which three—STAT1, CTLA4 and FOXP3—are genes in which variants cause monogenic forms of autoimmune diabetes. Seven of these genes affect the activity of regulatory T cells, and two are involved in immune responses to microbial lipids. Four T1D-associated genomic regions could be identified as master regulators via trans-effects on gene expression. These results support the sparse effector hypothesis and reshape our understanding of the genetic architecture of T1D.

Correspondence paul.mckeigue@ed.ac.uk Iakovliev et al. investigate whether the effects of common SNPs on risk of type 1 diabetes are mediated through trans-effects on expression of core genes. By testing for association with genotypic scores that aggregate predicted trans-effects, they identify a set of putative core genes for type 1 diabetes.

Introduction
The ''omnigenic'' hypothesis postulates that most of the genetic effects on a typical complex trait are mediated through weak trans-effects of common variants that coalesce on expression of a relatively sparse set of ''core'' effector genes in relevant tissues. 1 The rationale for this hypothesis is based on two established findings: (1) for a typical complex trait, most of the heritability is accounted for by small effects of many common variants; (2) for most genes, about 70% of the heritability of expression is attributable to trans-acting variants of small effect. 2 Genes that have trans-effects on expression of multiple core genes are termed ''peripheral master regulators.'' 3 As cis-expression quantitative trait loci (cis-eQTLs) usually have much larger effect sizes than trans-eQTLs, testing one SNP at a time for association of a disease with variants influencing gene expression will detect mostly cis-effects. If gene expression affects disease risk but most of the heritability of expression is attributable to trans-effects, the associations of the disease with aggregated effects of trans-acting variants on expression of the gene should generally be stronger than the associations with cis-acting variants in that gene. Thus, it may be possible to identify core genes by using summary results of genome-wide association studies (GWASs) of gene expression in relevant tissues to aggregate the trans-effects on expression of each gene as a genome-wide trans-score, then testing these trans-only genotypic scores, one gene at a time, for association with the disease.
For a disease-associated target gene that is identified only through trans-effects, any evidence of a cis-effect provides independent validation. Such evidence may come from association of disease with a cis-eQTL score for the target gene or simply with SNPs in or near the target gene. The most compelling evidence for causality is if the target gene identified through trans-effects on expression is a monogenic cause of the disease under study.
A practical limitation to applying this approach to investigating the genetic architecture of a complex trait is that GWAS results based on very large samples are required to learn small trans-effects on gene expression, which are then aggregated in genome-wide trans-scores. For gene expression in whole blood it is now feasible to construct such trans-scores, as results of a meta-analysis based on 31,684 individuals are now available from the eQTLGen Consortium. 4 This approach may be especially applicable to autoimmune diseases such as type 1 diabetes (T1D) because leukocytes in blood are a relevant tissue in which to study effects on gene expression. About half the genetic information for discrimination in T1D (defined as the logarithm of the sibling recurrence risk ratio) is accounted for by the HLA region 5 and about one-quarter by the top 40 SNPs outside the HLA region. 6 Thus, although the genetic architecture of T1D is less polygenic than that of most other complex traits, there is still a substantial contribution from polygenic effects.
The objective of this study was to test whether the omnigenic model applies to T1D with a large case-control dataset. We sought to determine whether putative core genes could be identified from associations of T1D with genome-wide trans-scores computed by aggregating the effects of multiple trans-eQTLs or trans-QTLs for circulating protein levels (trans-pQTLs). We describe also a conventional SNP-by-SNP GWAS analysis and the results of using cis-eQTL and cis-pQTL scores to investigate possible mediators of SNP associations with T1D that have been reported previously.

Subjects and methods
The study was carried out in accordance with the ethical principles in the Declaration of Helsinki and was approved by the Tayside Research Ethics Committee (reference 10/S1402/43). Informed consent was obtained from all participants.

Cases and controls
The Scottish Diabetes Research Network Type 1 Bioresource (SDRNT1BIO) is a consented cohort of 6,127 people clinically diagnosed as having T1D and aged 16 years and older at recruitment between 2010 and 2013. It comprises one-third of the adult population with T1D in Scotland. 7 Questionnaire data and entry day samples from this cohort were linked to clinical data from the Scottish Care Information Diabetes Collaboration electronic health records. The cases were restricted to those with definite T1D, defined as at least two insulin prescriptions within 1 year of diagnosis and no reported diagnosis of monogenic diabetes on questionnaire. Those who at entry had plasma C-peptide greater than 600 pmol/L and were negative for three auto-antibodies (glutamic acid decarboxylase, tyrosine phosphatase-related islet antigen 2, and zinc transporter 8) were excluded as ''possible type 2,'' as described previously. 8 The controls were participants in the Generation Scotland Family Health Study, a family-based cohort of 24,000 volunteers across Scotland aged 18 years and older at recruitment between 2006 and 2011. 9 Individuals were excluded if they had any record of diabetes based on self report, linkage to hospital records, or any prescription of insulin or antidiabetic oral medications.

Genotyping
The SDRNT1BIO cohort was genotyped with the Illumina Human Core Exome 24 bead array in the Center for Public Health Genomics lab in the University of Virginia. The Generation Scotland cohort was genotyped with the Illumina OmniExpressExome 8V 1-2A bead array. We applied a standard quality control (QC) pipeline to the allele signals including SNP calling with zCall, 10 alignment to human genome build hg19 (GRCh37) and filtering of SNPs to exclude those with minor allele frequency less than 1%, those that were monomorphic in the 1000 Genomes 11 and UK10K 12 reference panels, and those with genotype frequencies deviant from Hardy-Weinberg equilibrium (p < 10 À10 for non-HLA regions and p < 10 À12 for the HLA region).
Genotypes from case and control cohorts were combined and checked for heterozygosity and sex concordance. After pruning to remove SNPs in linkage disequilibrium, we calculated the genotype relationship matrix (correlation between genotype vectors). We performed pruning to remove related individuals until there were no remaining pairs of individuals with genotype correlation > 0.05. Principal components were calculated across the remaining individuals. The genotypes were phased with the Eagle 13 and ShapeIt (duoHMM) 14 software packages. We performed imputation against UK10K and 1000 Genomes reference panels by using the Sanger Institute's online imputation service. SNP positions were lifted to genome build hg38 (GRCh38) for all analyses after this step.

Statistical analyses
For the SNP-by-SNP GWAS, we tested each SNP for association with T1D in a logistic regression model adjusted for sex, age, and the first three genotypic principal components by using SNPTEST 15 with imputed genotype probabilities. SNPs with minor allele frequency less than 0.5% or imputed information content less than 70% (as calculated by SNPTEST) were excluded. We annotated the results with genomic regions previously reported to be associated with T1D. These were based on Robertson et al. (2021). 16 This dataset has 224 hit regions, of which 163 are outside the HLA region. The candidate gene assigned for each hit region is that given in the original tables from the Type 1 Diabetes Knowledge Portal (https://t1d.hugeamp.org/), Supplementary Table 9 (dominant/recessive models) of Robertson et al. (2021), 16 and Supplementary  17 The start and end positions of the transcription site of each gene (not including the promoter region) were obtained from Ensembl. 18 We next constructed scores for aggregated trans-effects on gene expression and tested for association with T1D. For gene expression in whole blood, we used summary GWAS statistics obtained from the eQTLGen meta-analysis 4 for 17,422 genes. In that study only 10,317 trait-associated SNPs, identified from GWAS Catalog and Immunobase, were tested by eQTLGen for trans-association with gene expression. We used the GENOSCORES platform 19 to calculate genome-wide trans-scores for expression of each gene and circulating levels of each protein as follows.
1. Summary statistics were filtered at p < 10 À5 . For each clump of variants containing at least one SNP with p < 10 À6 and separated by at least 1 Mb from other such clumps, regression coefficients for all SNPs in that clump were included in a locus-specific weights vector. This threshold is less stringent than would be used for declaring association in a genome-wide study, as our objective was to construct predictors rather than to test for association. 2. For each clump of SNPs, the correlations between genotypes for the retained variants were extracted from the European ancestry subset of the 1000 Genomes panel. Each locus-specific weights vector was premultiplied by the inverse of this correlation matrix to adjust for linkage disequilibrium. This adjustment approximates the regression coefficients that would be obtained in a multiple regression analysis of the individual-level GWAS dataset. The locus-specific score was calculated for each individual as the dot product of the individual's genotypes and the adjusted weights vector. 3. Each locus-specific score was classified as cis-if the distance from the clump to the transcription site of the respective gene was less than or equal to 50 kb, cis-x if the distance was 50 kb to 5 Mb, and trans-if the distance was more than 5 Mb. The genome-wide trans-score was computed as the sum of locus-specific trans-scores for the target gene. As the HLA region is a hotspot for trans-eQTLs for genes involved in the immune system, 20 and associations of these trans-eQTLs with T1D are heavily confounded by the direct effects of HLA antigens on T1D risk as described below, the HLA region (from 25 to 34 Mb on chromosome 6) was excluded from the computation of genome-wide transscores. This procedure generated 4,824 cis-scores for 4,702 genes and 4,102 genome-wide trans-scores.
Each of these scores was tested for association with T1D in a logistic regression model with T1D as dependent variable and the first three genotypic principal components as covariates. All scores were scaled to have standard deviation of 1 so that the log odds ratios represent the difference in log odds associated with a difference of 1 standard deviation in the score. The information for discrimination, in natural log units, can be calculated as half the square of this standardized log odds ratio. 21 Where SNPs that are associated with T1D have pleiotropic transeffects on gene expression, we expect genome-wide trans-scores for these genes to be associated with T1D even if they are only ''bystander'' genes that are not in the causal path from SNPs to T1D. Where a trans-score for expression of a gene is associated with T1D, that gene is more likely to be a core gene if the score is the sum of multiple locus-specific trans-scores. This is because aggregating effects of multiple trans-eQTLs on a target gene will amplify the signal of a core gene but not the noise generated by trans-effects on the expression of a ''bystander'' gene. As an index of the effective number of unlinked trans-eQTLs contributing to each genome-wide trans-score, we calculated the diversity index The genes were identified using T1D association with genome-wide trans-scores, locus diversity, and whether the variation in the gene is a known monogenic cause of T1D. or Hill number. 22 For each gene, the diversity index was computed from the variances s 1 ; .; s K of the K locus-specific trans-scores This index can take values from 1, if one of the eQTLs has much larger variance than the others, to K, if the variances of the locus-specific trans-scores are equal. Figure 1 shows a flow chart of this process for selection of putative effector genes.
The procedure described above for effects on gene expression was repeated for effects on circulating proteins. We extracted summary GWAS statistics from five studies of circulating proteins: 48 proteins in plasma or serum on the Bio-Rad cytokine panel, 23 1,478 proteins in plasma on the SomaLogic panel, 24 4,719 proteins in plasma on the SomaScan v4 panel, 25 83 proteins in plasma on the Olink cardiovascular panel, 26 and 70 proteins in plasma on the Olink inflammation panel. 27 We computed cis-scores for 889 proteins and trans-scores for 2,745 proteins. The aggregation of locus-specific trans-scores into genome-wide trans-scores was performed in the same manner as for eQTLs. For putative core genes identified by this procedure, we tested for interaction with the effects of the HLA region on T1D risk by using a case-only test of association between the trans-score and the risk score for the HLA region. We constructed summary risk scores for the HLA region on the basis of eight variables derived from five SNPs that tag class I and class II haplotypes as described by Oram et al. 28 The weights for these eight variables were learned by fitting a multiple logistic regression model to the case-control study.
To test whether genetic effects on the frequencies of immune cell types could explain the associations of T1D with genomewide trans-scores for gene expression, we tested for association of T1D with genome-wide scores for immune cell phenotypes. We used summary statistics from two studies of immune cell phenotypes in peripheral blood 29,30 with sample sizes of 497 and 1,629 individuals to calculate genome-wide scores for each immune cell phenotype and tested these scores for association with T1D.

SNP associations with T1D
We first undertook a conventional SNP-by-SNP GWAS analysis of this case-control study. Figure 2A shows a Manhattan plot of the SNP associations with T1D.
Clumps of SNPs documented in the Type 1 Diabetes Knowledge Portal as containing T1D-associated SNPs are highlighted. Of the regions detected at genome-wide significance in this study, two have not previously been described in detail: the ADAM30 region on chromosome 1 and the COLEC10 region on chromosome 8. The top SNP in the ADAM30 region was rs406767 at 120.01 Mb, within the NOTCH2 transcription site. T1D was associated with the cis-eQTL score for expression of NOTCH2 in whole blood (standardized odds ratio 0.94, p ¼ 5 3 10 À4 ) and with the cis-pQTL score for REG4 (standardized odds ratio 0.92, p ¼ 1310 À5 ). The top T1D-associated SNP in the region labeled COLEC10 is at 119.05 Mb within COLEC10. The only cis-QTL score associated with T1D in this region is a pQTL score for TNFRSF11B (transcription site 119.79 to 119.81 Mb, standardized odds ratio 0.92, p ¼ 2 3 10 À5 ).
To investigate whether cis-effects on gene expression or protein levels could help with identifying the local genes mediating the effects of T1D-associated SNPs, we tested Manhattan plots for SNPs, locusspecific trans-scores, and counts of transscores originating from each locus (A) Manhattan plot of SNP associations with T1D in case-control study. Minus log 10 p value truncated at 20. SNPs within clumps of T1D-associated SNPs reported by the Type 1 Diabetes Knowledge Portal are highlighted in blue. A clump is defined by flanking regions of 20 kb if a single SNP is reported. Each clump with lead SNP ( À log 10 p > 6) is labeled with the nearest T1D-associated gene.
(B) Manhattan-like plot of T1D associations with locus-specific trans-scores, excluding the HLA region. Scores for which Àlog 10 p > 6 are labeled as in (A). (C) Number of trans-eQTLs that each genomic region contributes to T1D-associated genome-wide trans-scores. Each set of overlapping SNP clumps contributing to T1D-associated trans-scores defines a region. Regions contributing to more than two scores are labeled with previously reported T1D-associated genes.
all cis-eQTL and cis-pQTL scores for association with T1D, excluding scores for genes within the HLA region. Of 4,801 cis-eQTL scores and 1,003 cis-pQTL scores tested, 11 were associated with T1D at p < 10 À6 , as shown in Table 1. All these T1D-associated cis-eQTLs were in regions where T1D-associated SNPs have previously been reported, but the cis-eQTL target was not always the same gene as that to which the SNP association in that region had been attributed.
d The association of T1D with SNPs in the 12q13 region has been attributed to ERBB3. 31,32 Table S1 shows however that the strongest cis-QTL score associations with T1D in this region were for RAB5B, ERBB3, and IL-23A.
d The association of T1D with SNPs in the 12q24.12 region (111.3-111.9 Mb) containing ATXN2 has been attributed to a non-synonymous SNP in SHB23. 31 The T1D-associated SNPs in the 12q24 region lie in a range from 111.39 to 112.5 Mb, corresponding to a linkage disequilibrium block in European populations. Tables 1 and S1 show, however, that the transcription sites of the genes with T1D-associated cis-eQTL or cis-pQTL scores extend over a broader region from 108.56 to 114.35 Mb (bands 12q24.11 to 12q24.13), including ISCU, SELPLG, PPTC7, GLTP, TRAFD1, SDSL and TBX5.
T1D was strongly associated with an extended cis-pQTL score for NECTIN2 generated by SNPs from 48.59 to 48.74 Mb on chromosome 19, 3.7 Mb downstream of the transcription site and spanning FUT2. Whether this should be classified as an extended cis-pQTL is uncertain, as the 3D Genome Browser 33 (see web resources) shows no Hi-C interactions between the pQTL and the transcript region in a relevant cell line (Liver-STL011). This extended cis-pQTL was also a trans-pQTL for CCL15 (Table S6), contributing to the aggregated trans-score for this chemokine, which was associated with T1D as discussed below.
Associations of T1D with genome-wide trans-scores for gene expression We next tested for associations of T1D with trans-scores for gene expression. By far the strongest associations were those generated by trans-eQTLs in the HLA region (Table S2). Of the 1,041 genes for which trans-scores could be calculated from SNPs in this region, 181 were associated with T1D at p < 10 À6 . These genes include the TRAV and TRBV gene families that encode T cell receptors that recognize the antigens encoded by HLA genes. These associations of T1D with HLA region-wide trans-scores are so heavily confounded by the direct effects of HLA antigens on T1D risk that they are difficult to interpret. To control this confounding, the HLA region was excluded from the aggregation of locus-specific trans-scores into genomewide scores.
We next tested all 4,103 genome-wide trans-scores for association with T1D. Of these, 309 were associated at p < 10 À6 (Table S3). Figure 2B shows that 44 regions contributed eQTLs to the 219 genome-wide trans-scores that were associated with T1D at p < 10 À9 . Of the 32 regions contributing to two or more of these 219 T1D-associated genome-wide scores, ten contained SNPs previously reported as GWAS hits for T1D. Four regions containing T1D-associated SNPs contributed to more than five of these scores: PTPN22 on chromosome 1, BACH2 on chromosome 5, IKZF1 on chromosome 7, and the 12q24 region containing ATXN2.
For further analysis of the genes with strongest evidence for a causal relationship of function to T1D, the T1D-associated genome-wide trans-scores were restricted to those that met the following criteria: (1) effective number of eQTLs greater than 5 and p < 10 À9 or (2) monogenic cause of autoimmune diabetes and p < 10 À6 . Seven putative core genes were designated on the basis of criterion (1); STAT1 and FOXP3 were designated as putative core genes on the basis of criterion (2). Table 2 summarizes the associations of T1D with the nine genes designated as putative core genes. Three of these genes-CTLA4, STAT1, and FOXP3-have been previously identified as causing monogenic autoimmune Table 1. Genes for which cis-eQTL or cis-pQTL score is associated with T1D at p < 10 À6 diabetes. 34 For four of these genes-CTLA4, STAT1, CD5, and IL10RA-T1D associations with SNPs in or near the gene have been reported by the Type 1 Diabetes Knowledge Consortium. For CTLA4, both the cis-score and the genome-wide trans-score for gene expression were associated with T1D. Although the cis-effect is in the opposite direction to the trans-effect, this is explicable as the T1Dassociated SNPs in CTLA4 alter the splicing of the transcript. 35,36 To investigate the specificity of the associations of genome-wide trans-scores with T1D, we examined the correlations between these scores in the control group. Figure S1 shows that the trans-scores for the putative core genes were not highly correlated with each other. The only other gene with a trans-score that was highly correlated (r 2 > 0.7) with a trans-score for any of the putative core genes was PARP3, which was highly correlated (r ¼ 0.86) with the score for LGALS3BP. Table 3 shows that 14 trans-eQTLs within 200 kb of clusters of SNPs that have been reported as T1D associated contributed to one or more of the genome-wide trans-scores for these nine putative core genes. Figures 3 and 4 show that the trans-eQTLs contributing to the genome-wide scores for these nine genes include some loci that were weakly associated with T1D but did not reach genome-wide significance in this study or in previously reported studies.

Gene symbol
To investigate whether these associations of T1D with genome-wide trans-scores could be explained by variation in the profile of immune cell types, we tested for association of T1D with genome-wide scores for immune cell phenotypes. Tables S4 and S5 show that T1D was not associated with genome-wide scores for any immune cell

phenotypes, including the levels of regulatory T cells as percentages of T cells and of CD4 þ T cells.
Associations of T1D with genome-wide trans-scores for circulating protein levels We repeated the procedure for analysis of aggregated trans-effects by using the scores for trans-effects on circulating protein levels. This yielded five putative core genes-EIF4G3, CCL19, CRTAM, LIN7B and NCR1-on the basis of association of T1D at p < 10 À9 with a transscore for the gene product and effective number of trans-pQTLs > 5 and another nine-CD5L, CD48, FCGR3B, GCG, CXCL9, LAG3, CCL15, ICAM2, and BPIFA2-at the less stringent threshold of p < 10 À6 (Table 4). Of these, only GCG, which encodes glucagon, has been previously reported as T1D associated in a conventional SNP-by-SNP GWAS. 16 Correlations between these scores in the control group are weak except for EIF4G3 and LIN7B (r ¼ 0.74). For LAG3, the expression score (Table S3) also was strongly associated with T1D (p ¼7 3 10 À10 ), but as the effective number of QTLs for the expression score was only 2.9, it was not included in Table 2 as a putative core gene. Table S6 shows that many of the trans-pQTLs contributing to the T1D-associated genome-wide trans-scores for these proteins overlapped with regions in which SNP associations with T1D have previously been reported. Thus, the trans-pQTLs for circulating CXCL9 (chemokine ligand 9) levels included SNPs within 200 kb of the transcription sites of PTPN22, BCL11A, CCR9, KIAA1109, TULP1, INS, ATXN2, GPR183, and UBASH3A. All these genes are recorded by the Type 1 Diabetes Knowledge Consortium as associated with T1D.
Criteria for including genes based on association of T1D with trans-score for expression are shown in the header rows. The association of T1D with the cis-score for that gene is shown if a cis-score was constructed and was associated with T1D at p < 0.001. Any SNP association reported by the Type 1 Diabetes Knowledge Consortium is shown if the top SNP in the clump was within 200 kb of the transcription site of the target gene. This criterion excludes a clump of T1D-associated SNPs containing UBE4A, 263 kb downstream of IL10RA. The gene for the SNP association is that identified as the candidate or nearest gene in the original study.

Discussion
On the basis of aggregating the effects of multiple trans-eQTLs, this study identifies nine top candidates as putative core genes for T1D. Trans-eQTL effects of T1D-associated SNPs on STAT1 have been noted previously. 4,37 Although four of these are validated by evidence of cis-effects as defined above, the only one that would have been identified as a core gene for T1D in a conventional SNP-by-SNP GWAS analysis is CTLA4. Of these putative core genes, seven are involved in induction and activity of CD4 þ regulatory T cells (Tregs). FOXP3 is the canonical marker and regulator of Tregs. 38 CTLA4 is expressed on Tregs where it inhibits CD28-mediated activation of T cells. 39 STAT1 mediates the inhibition by IFNg of induction of Tregs 40 ; gain-of-function mutations in STAT1 cause autoimmune diabetes. CD5 promotes induction of Tregs by blocking mTOR activation. 41 IL10RA activates STAT3 in Tregs, which suppresses T h 17 inflammatory responses. 42 SNPs in IL10, which encodes the ligand, are associated with T1D. MEOX1 overexpression reprograms Tregs to acquire a transcriptional profile associated with tumor infiltration. 43 CD247 encodes the CD3z chain subunit of the T cell receptor complex. In congenic strains derived from the NOD mouse model, the NOD allele at Cd247 is associated with lower expression of CTLA-4 in Tregs and with higher rates of autoimmune diabetes. 44 The other two putative core genes are involved in the innate and acquired immune response to lipids (usually of microbial origin).
LGALS3BP inhibits transforming growth factor b-activated kinase 1-dependent activation of nuclear factor-kB by lipopolysaccharides. 45 CD1E is expressed intracellularly, where it modulates the loading of lipid antigens onto other CD1 isotypes that are displayed at the cell surface. 46 As T1D was not associated with genotypic scores for the absolute or relative levels of Tregs or with scores for other immune cell phenotypes, it is unlikely that the associations of T1D with trans-scores for genes such as FOXP3 that are expressed by Tregs can be explained by effects on the abundance of this cell type, although this cannot be ruled out, as the summary statistics we used to calculate scores for immune cell phenotypes were based on studies with relatively modest sample sizes.
The proportion of non-HLA heritability of T1D that is accounted for by the trans-scores in Table 2 is modest. For the FOXP3 trans-score, for instance, the information for discrimination (calculated as half the square of the standardized log odds ratio) is 0.026 natural log units, which is about 2% of the non-HLA genetic information for discrimination for T1D, assuming that the total genetic information for discrimination is about 2.5 (equivalent to sibling recurrence risk ratio of 12) and that half of this is contributed by genes outside the HLA region. The current version of the eQTLGen meta-analysis tested only 10,317 SNPs previously identified as trait-associated for trans-association with gene expression. As this will have missed many trans-eQTLs, our study is likely to underestimate the contribution of core genes to T1D.
Another limitation of the current version of the eQTL-Gen dataset is that the gene expression assays do not trans-eQTL regions are shown if they are within 200 kb of a clump of T1D-associated SNPs and contribute to one or more trans-scores for the putative core genes in Table 2.
necessarily distinguish between splicing variants that may be differentially regulated by trans-eQTLs. Both FOXP3 47 and CTLA4 35,36 are expressed in alternatively spliced isoforms that have different functional effects: this may explain why the trans-score associations of these genes with T1D are in the opposite direction to that expected given that loss-of-function mutations in these genes cause autoimmune diabetes. The overlap of trans-eQTLs for these putative core genes with regions in which SNP associations with T1D have been reported suggests that some SNP associations that have no obvious interpretation in terms of cis-effects,  such as the association of T1D with SNPs in the PLEKHH2 and NSF regions, may be explicable as trans-effects on the expression of core genes. We can tentatively identify four regions containing ''peripheral master regulators'' for T1D on the basis that they contribute trans-effects to multiple genes for which genome-wide trans-scores are associated with T1D. These regions are PTPN22, BACH2, IKZF1, and the 12q24 region containing ATXN2 and SH2B3. Not surprisingly, SNPs in these regions containing master regulators are associated with multiple autoimmune disorders. 48 Applying the same approach to circulating proteins identifies a different set of putative core genes, most of which are biologically relevant to T1D. One of the most interesting is NCR1, which may underlie the tissue specificity of the autoimmune reaction in T1D. NCR1 encodes the NKp46 receptor expressed by natural killer (NK) cells, which recognizes an unknown ligand expressed by pancreatic b cells; knockout of Ncr1 or blocking of the gene product inhibits the development of autoimmune diabetes in the NOD mouse. 49 CRTAM encodes a receptor expressed on the cell surface of activated NK and CD8 þ T cells. In the RIP-mOVA mouse model, knockout of Crtam in the donor strain inhibits induction of autoimmune diabetes by exogenous CD8 þ T cells. 50 LAG3 encodes an immune checkpoint receptor; deletion of this gene in the NOD mouse accelerates development of autoimmune dia-betes. 51 CXCL9, CCL15, and CCL19 encode chemokines. CXCL9 is expressed in pancreatic b cells in response to pro-inflammatory cytokines, 52 and deletion of the gene encoding its receptor CXCR3 accelerates diabetes in the NOD mouse. 53 CCL15 is expressed in the intestinal mucosa as an antibacterial peptide. 54 CCL19 is the ligand for CCR7, a receptor expressed on lymphocytes; in the NOD mouse, migration of T cells into inflamed islets is dependent on CCR7. 55 CD48 encodes a signaling molecule that regulates innate and adaptive immune responses through binding to the CD244 receptor on NK and other cytotoxic cells. 56 CD5L, initially reported as an apoptosis inhibitor, binds pathogen-associated molecular patterns, suggesting a role in innate immune response. 57 EIF4G3 regulates translation in lymph node stromal cells of genes that affect antigen presentation. 58 The sets of putative core genes identified by eQTL and pQTL analyses do not overlap, with the exception of LAG3. This is not surprising, as the genes identified through eQTL analysis are mostly transcription factors or receptors expressed on the surface of immune cells, while pQTL analysis can detect only genes that affect circulating protein levels. We would not expect leukocytes in peripheral blood to contribute much to the levels of these proteins.
Although the main focus of this paper is the identification of effector genes and master regulators via trans- effects, we have also used cis-eQTL and cis-pQTL associations to investigate cis-acting effects on T1D and to identify the genes most likely to mediate some previously reported SNP associations with T1D. For the 12q13 region containing ERBB3, a plausible candidate for the mediator is IL23A, which encodes a subunit of interleukin-23. This cytokine induces immune-mediated b cell damage and diabetes in a mouse model. 59 The association of T1D with SNPs in FUT2 has been widely attributed to FUT2, as one of the top T1D-associated SNPs is a non-synonymous variant that determines ABO blood group antigen secretor status. However, no biological mechanism for the association of FUT2 with T1D has been established. The cis-pQTL analysis suggests NECTIN2, which modulates T cell signaling by binding to the CD226 receptor, as a possible mediator. SNPs in CD226 are associated with T1D, and deletion of this gene protects against diabetes in the NOD mouse. 60 The approach used in this study-using summary GWAS results on gene expression to construct trans-only genotypic scores and then testing these for association in an individual-level study of the disease or trait of interest-complements the conventional SNP-by-SNP analysis of a GWAS, where detection of a SNP association with disease is followed by attempts to identify the gene that mediates this association. Our approach falls into the general category of transcriptome-wide association study designs. As our objective is specifically to detect effector genes through which the effects of many weak trans-eQTLs are mediated, we have constructed a procedure to maximize the detection of such genes on the basis of using summary-level data from a large study of gene expression to calculate genome-wide scores that aggregate the trans-effects on expression of each gene and then testing these scores for association with the disease under study. A related approach is ''expression quantitative trait score'' (eQTS) analysis in which summary-level GWAS data are used to construct genome-wide polygenic scores for traits of interest that were tested for individual-level association with gene expression in the eQTLGen dataset. 4 One limitation of the eQTS method is that it does not distinguish transand cis-effects on expression of each gene. A more fundamental limitation may be that testing for association of expression of each gene with a polygenic score for the disease is less statistically powerful than testing the aggregated effects of weak trans-eQTLs directly because any associations of disease with trans-eQTL effects on expression of a gene are diluted by the other SNPs included in the polygenic score. Another related approach that uses only summary statistics has been described as ''aggregative trans-eQTL analysis,'' which computes canonical correlations between linear combinations of SNPs associated with expression of each gene and linear combinations of SNPs on the trait of interest. 61 A limitation of that approach is that SNP associations are coded only as present/absent: the size and direction of effects on gene expression and the trait of interest are ignored.
These results support the omnigenic hypothesis in that selecting genome-wide trans-scores that are generated by multiple trans-QTLs and are strongly associated with T1D identifies a set of genes that were mostly not detected in a conventional SNP-by-SNP analysis but are biologically highly relevant to T1D. Some of these putative effector genes are validated by cis-effects, by experimental models, or by effects of other genes in the same pathway. For clarity, we suggest a revision of terminology: ''core genes'' can include both master regulators and effector genes, and the ''omnigenic'' hypothesis can be more precisely denoted as the sparse effector gene hypothesis. The hypothesis has implications for how GWASs should be analyzed, suggesting that analysis should focus on identifying effector genes through aggregated trans-eQTL effects rather than on identifying which SNP, in a cluster of SNPs with tiny effects on the outcome, is the functional variant. Core gene effects may be more robust than SNP associations to variation between populations. 62 The method described here is readily applicable to other disorders of the immune system for which whole blood is a relevant tissue in which to study gene expression. Wider application of this approach will require availability of more comprehensive summary-level results from large GWASs of gene expression in various tissues and measurements of genetic effects on splicing variants and post-transcriptional modification.

Data and code availability
All code used in this analysis is available at https://github. com/molepi-precmed/trans-qtls. Summary-level data is available from https://doi.org/10.5281/zenodo.7786862. Accredited researchers may apply to the Public Benefit and Privacy Panel for access to the individual-level data. The application should be submitted via HSC-PBPP website https://www.informationgovernance.scot.nhs.uk/ pbpphsc/. The platform used to compute locus-specific genotypic scores and the database of published GWAS summary statistics are accessible on the https:// genoscores.cphs.mvm.ed.ac.uk/ platform. The corresponding R package will be shared on request. Annotation of the known T1D associations was taken from Type 1 Diabetes Knowledge Portal: https://t1d.hugeamp.org.
Genotyping was supported by the Juvenile Diabetes Research Fund (17-2013-7). Generation Scotland received core support from the Chief Scientist Office of the Scottish Government Health Directorates (CZD/16/6) and the Scottish Funding Council (HR03006). The development of the GENOSCORES platform was supported by a Springboard Award (SBF006/1109) from the Academy of Medical Sciences, supported in turn by the Wellcome Trust, the UK Government Department of Business, Energy and Industrial Strategy, the British Heart Foundation, and Diabetes UK.

Declaration of interests
The authors declare no competing interests.