Laser capture microdissection of human pancreatic islets reveals novel eQTLs associated with type 2 diabetes

Objective Genome wide association studies (GWAS) for type 2 diabetes (T2D) have identified genetic loci that often localise in non-coding regions of the genome, suggesting gene regulation effects. We combined genetic and transcriptomic analysis from human islets obtained from brain-dead organ donors or surgical patients to detect expression quantitative trait loci (eQTLs) and shed light into the regulatory mechanisms of these genes. Methods Pancreatic islets were isolated either by laser capture microdissection (LCM) from surgical specimens of 103 metabolically phenotyped pancreatectomized patients (PPP) or by collagenase digestion of pancreas from 100 brain-dead organ donors (OD). Genotyping (> 8.7 million single nucleotide polymorphisms) and expression (> 47,000 transcripts and splice variants) analyses were combined to generate cis-eQTLs. Results After applying genome-wide false discovery rate significance thresholds, we identified 1,173 and 1,021 eQTLs in samples of OD and PPP, respectively. Among the strongest eQTLs shared between OD and PPP were CHURC1 (OD p-value=1.71 × 10-24; PPP p-value = 3.64 × 10–24) and PSPH (OD p-value = 3.92 × 10−26; PPP p-value = 3.64 × 10−24). We identified eQTLs in linkage-disequilibrium with GWAS loci T2D and associated traits, including TTLL6, MLX and KIF9 loci, which do not implicate the nearest gene. We found in the PPP datasets 11 eQTL genes, which were differentially expressed in T2D and two genes (CYP4V2 and TSEN2) associated with HbA1c but none in the OD samples. Conclusions eQTL analysis of LCM islets from PPP led us to identify novel genes which had not been previously linked to islet biology and T2D. The understanding gained from eQTL approaches, especially using surgical samples of living patients, provides a more accurate 3-dimensional representation than those from genetic studies alone.


INTRODUCTION
Genome-wide association studies (GWAS) meta-analyses have revealed >200 loci associated with type 2 diabetes (T2D) and associated traits, such as fasting glucose [1e5]. Recently, new statistical methods such as fine-mapping have identified further 40 loci [6]. However, deciphering the causal variants and making inferences from GWAS to physiology is still a challenge. Firstly, few GWAS single nucleotide polymorphisms (SNPs) are near insightful biological candidate genes. More importantly, most SNPs fall within the noncoding, regulatory regions of the genome [7], often far from coding regions or in regions where more than one gene lies. To overcome these complexities, combining GWAS with expression data to produce expression quantitative trait loci (eQTLs) has become a powerful tool to shed light into the causative mechanisms of these genetic associations [6,8e10]. However, one limitation is that whilst there is a considerable 1 overlap across human tissue transcriptomes, many eQTLs are cell type specific. Therefore, it is crucial to analyse eQTLs from appropriate tissues, e.g. those functionally involved in the pathology. With regard to diabetes, only a few studies have investigated eQTLs in pancreatic islet samples [11,12]. However, an insufficient supply of human islets, and technical issues related to the origin of the samples (i.e. from braindead organ donors rather than from surgical living patients), have restricted the performance of such studies. Here, we tested a total of 203 islet samples from European subjects with the aim to identify islet cis-eQTLs related to T2D.

Cohort
Blood and pancreatic tissue samples were collected from a total of 203 patients from two independent cohorts. The first cohort comprised of 100 islet samples isolated by limited proteolytic digestion of pancreas from brain-dead organ donors (OD), which included 19 subjects with T2D [13]. The second cohort consisted of 103 islet samples isolated by laser capture microdissection (LCM) from the healthy margins of surgical fragments of metabolically phenotyped pancreatectomized patients (PPP), who underwent surgery for different pancreatic diseases [13]. In the PPP cohort, 32 were normoglycaemic (fasting glycaemia <7.0 mmol/l; HbA1C 6.5%, glycaemia at 2 h after presurgical oral glucose tolerance test [OGTT] <7.8 mmol/l), 15 had impaired glucose tolerance (IGT, fasting glycaemia <7.0 mmol/l; HbA1C 6.5%, OGTT at 2 h of !7.8 to <11.1 mmol/l), 36 had T2D (fasting glycaemia !7.0 mmol/l; HbA1C >6.5%, history of diabetes for >1 year) and 20 had type 3 diabetes (T3D), based on clinical history and pre-surgical laboratory tests according to the American Diabetes Association (ADA) guidelines. T3D was defined as a secondary form of diabetes with an onset not longer than 1 year prior to the appearance of the symptoms related to the primary pancreatic disease. Samples were collected as part of the IMIDIA consortium, and more information on the donors in these cohorts has been recently reported [13].

Islet collection and RNA extraction
Specific details about the methodologies for islet retrieval and RNA extraction have been previously described [13]. Briefly, RNA was extracted from OD islet samples with the PicoPure RNA Isolation Kit, following the manufacturer's protocol. Islets were extracted using collagenase extraction methods [14]. Islets of PPP were retrieved from cryopreserved surgical samples by LCM with a Zeiss Palm MicroBeam system. RNA was extracted using the Arcturus PicoPure Isolation kit.
2.3. Expression microarray RNA was analysed for gene expression profiling using the Affymetrix Human Genome U133 Plus 2.0 Array (deposited under the accession number GSE76896), according to the manufacturer's protocol. Data from Gene Expression Omnibus (GEO) and annotation data have been imported using R package affy. The array data include 54,210 probe sets covering over 47,000 transcripts and splice variants. Expression data was normalised using the Robust Multichip Analysis (RMA) method as implemented in R package affy [15]. Possible batch effect was corrected for using the "ComBat" approach implement in R package sva [16], using only the batch information. Differential expression analysis to compare non-diabetic versus T2D samples was performed using Limma (Bioconductor R package) with age and gender as covariates. Multiple testing was accounted for using Benjamini-Hochberg method to correct p-values. PPP and OD samples clustered separately ( Supplementary Fig. 4a), but there was no difference in clustering in other factors, such as gender ( Supplementary Fig. 4b). For PPP, as the HbA1C levels were similar between ND controls and IGT, both samples were categorised as controls and compared to T3D and T2D samples.

DNA extraction and genotyping
A total of 203 DNA samples were extracted from either blood or pancreatic tissue and isolated using the DNeasy Blood & Tissue kit (Qiagen, Germany). All samples were genotyped using the Illumina Omni2.5M array and run on the Illumina iScan platform at the Imperial College London Centre (Hammersmith Campus, Imperial College London, UK). Genotypes were called using the Genome-studio software. Quality-control was performed and SNPs were kept for further analysis according to the following thresholds: 1/maf >0.05, 2/Hardy-Weinberg equilibrium >0.001 and 3/call rate >0.95. Prephasing was performed with ShapeIt (v2.r790) and imputation with Impute2 (v2.3.2) using 1,000 genomes panel (phase 3). This lead to a dataset of 1,233,520 SNPs and imputation lead to a further 7,574,416 SNPs, for a total of >8.7 million SNPs analysed. Principal component analysis (PCA) for ethnicity using genomics data from 1,000 genomes confirmed the vast majority of samples were of European descent ( Figure 1A).
2.5. eQTL analysis eQTL analysis was carried out on all 203 samples. The eQTL analysis was performed using FastQTL software [17], with gender and age as covariates. P-values were computed using an adaptive permutation pass approach with a permutation number set from 1,000 to 10,000. We performed a genome-wide genotyping and expression analysis in OD and PPP samples to identify eQTLs ( Figure 1B). At first, a ciswindow was defined as 1 Mb between SNPs and expression probes and then reduced to 500 Kb based on the observed p-values distribution ( Supplementary Fig. 2). All eQTLs with a FDR<5% were considered to be significant. The eQTL regions were annotated to published binding sites of human islet transcription factors [18], which were subdivided into five classes: C1 (promoters), C2 (inactive enhancers), C3 (active enhancers), C4 (CTCF-bound sites) and C5 (others), which was not CTCF-bound and was not bound to a known histone modification.

Bioinformatics
The Database for Annotation, Visualisation and Integrated Discovery (DAVID) is a software tool that clusters proteins into functional annotations to facilitate understanding of complex datasets [19] (https:// david.ncifcrf.gov/). The DAVID tool was used to cluster genes to identify enriched gene ontological functions. P-values were calculated using Fisher's exact test to score the enriched gene ontology terms. In addition, the OD and PPP eQTL datasets were exported to Ingenuity Pathways Analysis (IPA) Software (Ingenuity Systems, Incorporated, California, USA; www.ingenuity.com). A core analysis was performed to identify pathways. The pathways were organised according to significance using a Fisher's exact test, with a cut-off of a elog (pvalue) < 1.3. A z-score was used to predict downstream regulator activity (activation or inhibition) patterns. For the upstream analysis, a p-value < 0.01 was used for relationships between regulators.

Identification of cis-eQTLs in human islet samples from two distinct cohorts
Our analysis resulted in the identification of a total of 1,173 and 1,021 significant eQTLs in the samples of OD and PPP, respectively (Supplementary Table 1). We found that only 60% of eQTL genes were shared between the two cohorts and were consistent in the direction of the effect. These differences could conceivably be attributed to differences in the method for islet isolation between the two cohorts. Among the strongest eQTLs shared between OD and PPP were CHURC1 Churchill domain containing 1 (CHURC1) eQTLs have recently been reported to be associated with T2D and obesity in both adipose and muscle tissues [20,21], suggesting a pleiotropic effect of this gene in T2D pathophysiology. Phosphoserine phosphatase (PSPH) catalyses the last step of serine biosynthesis from carbohydrates [22]. Other strong eQTLs unique in either dataset were ACO1 in PPP (p ¼ 1.47 Â 10 À25 ) and ELP5 (p-value ¼ 5.71 Â 10 À24 ) in OD samples. Aconitase 1 (ACO1) is an B. An overview of the methodology utilized to obtain eQTLs from islets of OD and PPP subjects were isolated by limited digestion from OD material and by LCM from PPP material. Acis-window of 500 kb was used with adjustment on gender and age and a false discovery rate (FDR) of <5% was used as a cut-off. essential enzyme involved in the TCA cycle and cellular iron homeostasis [23]. ELP5 is a member of the RNA polymerase II elongator complex, which plays a role in chromatin remodelling and histone acetylation [24]. Of the identified eQTLs in OD and PPP, 42 were transcription factors. This was determined using the TFcheckpoint, a resource for transcription factors that provides a comprehensive list of transcription factor annotations with evidence from the literature [25]. In addition, a total of 46 in OD and 46 in PPP non-coding RNAs, including long noncoding and non-protein coding RNAs, were found to be significantly affected by nearby SNPs. Of these, 26 were shared between the OD and PPP datasets. We compared our analysis to published eQTLs in islets from OD: one study in 89 OD identified 616 eQTLs [11]; another study in 118 OD found 2,341 eQTLs [12]. While the reproducibility rate between the two studies was 43% [12], we found a 28% (OD) and 29% (PPP) shared genes with van de Bunt et al. [12] and 23% (OD) and 18% (PPP) with Fadista et al. [11]. Of these, 24 genes were shared between our OD and PPP samples and those in these previous datasets, consistent in direction of effect (Table 1; Figure 2B). This low number is likely due to substantial differences in RNA extraction and transcriptomic methods among these studies. For instance, we used microarrays for transcriptomic analysis, whereas the other studies relied on RNA sequencing. Nevertheless, 42% of the shared genes were among the top 100 identified eQTLs in our study, including CHURC1, ERAP2, and LDHC.

Regulatory and biological interpretations of eQTL genes
Consistent with these regions being regulatory, we annotated the eQTL datasets with active open chromatin maps in islets (promoters, active enhancers and CTCF regions) [15] and found 102 and 93 sites overlapping with open chromatin regions in OD and PPP, respectively ( Figure 2C, Supplementary Tables 2e3). Of these, 15 were shared between the two datasets, with the majority being in active enhancer sites and clusters. One shared gene relevant for beta-cell biology was RPH3AL/NOC2, which has been shown to be crucial for exocytosis of insulin in pancreatic beta cells [26]. The gene ontology (GO) tool was used to make inferences on the biological function of the eQTL genes. GO identified ontologies that relate to beta cell function included cell junction, positive regulation of GTPase activity and glutathione metabolism (Supplementary Fig. 3). In addition, to make inferences about the identified eQTL genes, we utilised tools in the Ingenuity Pathway Analysis (IPA) software. None of the top pathways was shared between OD and PPP samples, pointing to differences in the donors and the procedure for islet isolation ( Supplementary Fig. 4). Interestingly, the transcriptional regulator of pancreatic beta-cell function and maturity-onset diabetes of the young (MODY4) gene HNF4A was found to the most significant upstream regulator in the IPA analysis for both PPP (p-value ¼ 0.003; zscore ¼ 0.45) and OD (p-value ¼ 0.0001; z-score ¼ 0.98) (Figure 3). HNF4A modulates regulatory elements in the promoters and enhancers of genes involved in glucose, fatty acid and cholesterol metabolism. It regulates gene expression in pancreatic beta cells to achieve glucose homeostasis and activates insulin genes both directly and indirectly. Our data show that many genes have a relevance for carbohydrate and lipid metabolism in addition to their association with T2D.

eQTL genes associated with T2D and associated traits
We then focused our analysis on GWAS loci and genes for T2D and associated traits. First, we found an enrichment for GWAS loci in eQTLs in MAGIC fasting glucose (OD p ¼ 4.8 Â 10 À4 ; PPP p ¼ 4.1 Â 10 À4 ), fasting insulin (OD p ¼ 0.013; PPP p ¼ 0.059), HOMA-IR (OD p ¼ 0.019; PPP p ¼ 0.018) and HOMA-B (OD p ¼ 0.001; PPP p ¼ 0.0025) and DIAGRAM (OD p ¼ 0.0039; PPP p ¼ 3.7 Â 10 À4 ) using a Fisher's exact test (Supplementary  Table 4). We then investigated whether T2D genes were also identified in our eQTL study and found a total of eleven genes ( Table 2). Seven of these overlapped GWAS genes associated with T2D alone (AP3S2, GPSM1, MAEA, TLE4, TMEM163, SSR1, and UBE2Z), three with T2D associated traits alone (IGF1R for fasting plasma glucose, ABO for disposition index and UHRF1BP1 for fasting plasma insulin), and one for both T2D and fasting glucose levels (ADCY5). Of these genes, AP3S2 and IGF1R were represented in both our OD and PPP datasets. AP3S2, encoding for an adaptor protein, was the only gene replicated in both our datasets in addition to reported eQTLs in islets [11,12]. ADCY5 is a member of the adenylate cyclase family, which has previously been shown in a knockout study to be involved in glucose stimulation and insulin secretion [27]. Interestingly, we have previously demonstrated that SSR1 expression is enriched in human islets and isolated beta cells [28]. In addition, for both OD and PPP datasets, we found strong associations with UBE2Z, a ubiquitin-conjugating enzyme. Although this gene has not been previously been studied for its role in beta cells, UBE2Z was also reported to be associated with coronary artery disease in different populations [29e31]. UHRF1BP1 was the only gene that was annotated to an active enhancer in pancreatic islets. We and others from the GIANT, MAGIC and ExomeBP consortia, recently identified 40 additional loci associated with T2D [6] and found that TPCN2, a gene encoding for a transmembrane ion channel with an effect on insulin action, also has an eQTL in both PPP and OD. TPCN2 À/À mice display improved insulin sensitivity, which is in agreement with our findings that an increased expression of TPCN2 associated with the rs72928978 reported SNP [6]. We then assessed whether our eQTL SNPs were in linkage disequilibrium with GWAS reported SNPs for T2D and associated traits using the GWAS catalogue ((https://www.ebi.ac.uk/gwas/) and a recently published GWAS [6]. The most significant loci in the PPP and OD  datasets are shown in Table 3. Notably, approximately 60% did not implicate the nearest gene. This includes the CENTD2 (ARAP1) locus, which a recent study confirmed to be associated with T2D risk through the reduced expression of the nearby STARD10 gene [32]. In addition, one of the most significant eQTLs we found was the rs34569841 SNP, which is associated with decreased expression of the long non-coding RNA LOC101927636. This SNP is in strong linkage disequilibrium with the previously reported rs13134327 SNP (R 2 ¼ 1), which is associated with HbA1C and was mapped to the FREM3 gene [33].

eQTL genes differentially expressed with T2D and HbA1C levels
Our eQTL analysis further identified in the PPP dataset only 21 genes (Table 4), which were also differentially expressed, 11 of which also consistently in their directionality. Among them was SCTR, which encodes for the secretin receptor and is the most potent regulator for bicarbonate in the pancreas. Although early studies showed that SCTR is linked to insulin release in humans (Lerner and Porte 1970), this association has not since been re-investigated. eQTLs in the mitochondrial Acid Phosphatase 6 Lysophosphatidic ACP6 have been associated with T2D and BMI [34]. Three loci, namely KLHDC10, VAMP1, and CFLAR lie within active islet enhancer regions. Among them, however, only KLHDC10, which is involved in the activation of oxidative stress [35], was consistent in its directional effect between eQTL and differential expression. In addition, of the differentially expressed loci, only two were transcription factors (ZNF117 and ONECUT2). Lastly, to identify genes whose expression was altered by glycaemia, we correlated gene expression to HbA1C levels, which is a

DISCUSSION
Here we show the most comprehensive cis-eQTL analysis in relevant islet samples in two distinct cohorts, including samples from 100 OD and for the first time 103 PPP (e.g. from living surgical patients). To our knowledge, this is also the largest eQTL study using islet samples to date. Our thorough cohorts allowed us to identify eQTLs with a relevance to T2D and HbA1C levels. These genes have diverse roles within the cell and further studies need to be performed to understand their roles within the pancreatic beta cell (Figure 4). One main finding was the difference in results observed between our two cohorts. We found that the PPP and OD samples cluster separately, suggesting that the differences in transcriptomic signatures between the two cohorts is primarily due in part to the islet isolation methods. Indeed, we recently compared the transcriptomes of PPP and OD islets and found that those of islets extracted from the same ODs using either LCM or enzymatic digestion clustered separately, with LCM islet transcriptomes clustering with the transcriptomes of LCM islets from PPP [13]. These differences were also highlighted in our pathway analysis Figure 4: Summary of eQTL genes with relevance to T2D. A schematic of a representative pancreatic beta cell, with identified genes annotated to their known sub-cellular localization using gene ontology. These genes include overlapping GWAS genes (black), eQTLs in LD with GWAS loci (asterisks), eQTL genes from the PPP dataset that were differentially expressed in T2D (red) or associated with HbA1c (pink), consistent in directional effect with eQTLs. (Figure was illustrated using app.biorender.io). using IPA. In addition, these differences may be attributed to stressful conditions while OD patients were in intensive-care prior to braindeath, which may increase inflammatory markers. In contrast, for LCM samples, the surgical specimen are subject to immediate cryofixation, thereby limiting transcriptomic changes. As all previous eQTL and transcriptomic investigations have been performed using islets isolated by enzymatic digestion from OD [11,12,36], our study including LCM islets from PPP is unique. In addition to isolation methods, methodological differences in RNA expression analyses likely contribute to the low number of shared genes between our study and previous eQTL analyses in islets [11,12]. The reason for the use of microarrays in our study is that at the time these studies were conceived and undertaken, RNA-seq procedures (based on nextgeneration sequencing) were still very expensive and not yet fully consolidated. However, one robust gene that survived these technical and methodological differences and which was replicated amongst all studies is ZFP57. Loss-of-function mutations in the ZFP57 transcription factor is associated with an imprinting disorder, which includes transient neonatal diabetes [37,38]. ZFP57 is expressed in undifferentiated cells and downregulated during cell differentiation [39] and is a crucial regulator involved in DNA methylation during development [40]. It would be interesting to further investigate whether DNA methylation changes within this gene has a role in the cell biology of adult beta cells. In our comparison of GWAS loci and eQTLs, we were able to provide an up-to-date analysis of approximately 300 identified genes linked to T2D and associated traits. An intriguing finding was that long noncoding RNAs were found to be the most significantly associated with GWAS SNPs, which includes the long non-coding RNA LOC101927636 locus, which in GWAS studies was previously assigned to FREM3 [33]. This is in addition to approximately 66 non-coding RNAs identified between the OD and PPP datasets. Accumulating literature has shown that SNPs in long non-coding RNAs are associated with human diseases, including T2D, hence highlighting their role as master regulators [41,42]. It is important to note that islets include a number of cell-types and although we have considered normalising to housekeeping genes identified from single-cell transcriptomic analysis in islets [43,44], these genes were highly variable among our samples. While it is important to bear in mind that some of the eQTLs identified are not beta-cell specific, there is substantial data to gain from studying the islet as an intact microorgan.
In conclusion, we provide a catalogue of cis-eQTLs from the largest todate sample size from two separate cohorts of non-diabetic and T2D subjects and which includes the first in-depth combined genetic and expression analysis of human islets isolated by LCM. Ultimately, the knowledge gained from eQTL approaches provides a preferable and more accurate 3-dimensional representation than those from DNA variants alone. In vitro and functional analyses are required to definitively prove the role of newly identified genes in relation to islet cell biology and T2D disease.