Simultaneous Copy Number Losses within Multiple Subtelomeric Regions in Early-Onset Type2 Diabetes Mellitus

Genetic factors play very important roles in the onset and progression of type 2 diabetes mellitus (T2DM). However, the genetic factors correlating with T2DM onset have not as yet been fully clarified. We previously found that copy number losses in the subtelomeric region on chromosome 4p16.3 were detected in early-onset Japanese T2DM patients (onset age <35 years) at a high frequency. Herein, we additionally found two novel copy number losses within the subtelomeric regions on chromosomes 16q24.2-3 and 22q13.31-33, which have significant associations with early-onset Japanese T2DM. The associations were statistically significant by Fisher's exact tests with P values of 5.19×10−3 and 1.81×10−3 and odds ratios of 5.7 and 4.4 for 16q24.2-3 and 22q13.31-33, respectively. Furthermore, copy number variation (CNV) analysis of the whole genome using the CNV BeadChip system verified simultaneous copy number losses in all three subtelomeric regions in 11 of our 100 T2DM subjects, while none of 100 non-diabetic controls showed the copy number losses in all three regions. Our results suggest that the mechanism underlying induction of CNVs is involved in the pathogenesis of early-onset T2DM. Thus, copy number losses within multiple subtelomeric regions are strongly associated with early-onset T2DM and examination of simultaneous CNVs in these three regions may lead to the development of an accurate and selective procedure for detecting genetic susceptibility to T2DM.


Introduction
Numbers of patients with type 2 diabetes mellitus (T2DM) have been increasing annually worldwide. Environmental factors, such as aging, westernized high-fat diets, lack of exercise and everyday stress affect the onset and progression of T2DM. On the other hand, according to epidemiological studies, genetic factors also play a major role [1]. For example, concordance rates and heritability for T2DM are higher in identical than in fraternal twins [2]. When both parents have T2DM, the risk of this disease in their children is three to four times higher than that of the offspring of non-diabetics [3]. Furthermore, the sibling recurrencerisk ratios for T2DM were substantially elevated in families with diabetes in both a parent and a grandparent [4]. Therefore, many investigators worldwide have been intensively searching for genetic factors associated with susceptibility to T2DM.
Loci for rare monogenic forms of familial diabetes, such as maturity-onset diabetes of the young (MODY) [5], including glucokinase gene mutations [6], mitochondrial diabetes [7,8], and Wolfram syndrome [9], have been demonstrated in small proportions of patients. However, the genetic etiology of familial T2DM has not been revealed in the majority of cases. To identify genetic factors conferring susceptibility to T2DM, genome-wide association studies (GWASs) using single nucleotide polymorphism (SNP) markers have been performed. These GWASs and replication studies have identified multiple susceptibility loci, such as TCF7L2 [10], GLIS3 [11], KCNK16 [11,12], MAEA [12] and KCNQ1 [13,14]. However, their odds ratios are consistently in the 1.1-1.4 range [9,13], suggesting their contributions to be relatively low. Furthermore, among the loci detected to date, the initiallyreported associations have not necessarily been replicated in subsequent studies [15,16].
Other chromosomal mutations, which cover wider genomic regions, include duplication, deletion, insertion, translocation, block substation, indel (insertion-deletion) and copy number variations (CNVs). Among these genetic variations, CNVs are increasingly being recognized due to their major contributions to a number of diseases, including certain malignancies. To date, the existence of CNV was considered to be very rare, since CNV would have a fatal impact on brain function and/or the survival of individuals [17]. However, recent advances in human genome studies have totally revised this concept. Many case reports have made it clear that CNVs exist even in healthy individuals [15,16].
In our search for other CNVs conferring genetic susceptibility to T2DM, we performed genome-wide CNV analyses in early-onset Japanese T2DM subjects. Early-onset patients had developed T2DM before environmental and acquired factors would have had major impacts, suggesting these patients to have stronger inherited factors. Therefore, we enrolled T2DM subjects whose onsets had been before 35 years of age. As a result, 94 of our 100 T2DM subjects had a family history of T2DM within third-degree relatives [32]. We first screened CNVs in the whole genome using the deCODE-Illumina CNV370K BeadChip system which focuses on the CNV-rich region of the human genome, followed by validation and characterization using an Agilent regiontargeted high-density custom-made oligonucleotide tiling microarray [32]. We found two novel copy number losses within the multiple subtelomeric regions on chromosomes 16q24.2-3 and 22q13.31-33, which have associations with early-onset diabetes. In addition, importantly, copy number losses in chromosomes 4p16.3, 16q24.2-3 and 22q13.31-33 were often detected simultaneously in the same subjects. In particular, all three copy number losses were verified in 11 of our 100 T2DM patients, while none of the non-diabetic controls showed simultaneous copy number losses in these regions. Our results suggest that the combination of CNVs in these three genome regions is a potential DNA marker for T2DM, and that the mechanism underlying induction of CNVs, rather than a single gene in these CNV regions, is involved in the pathogenesis of T2DM.

Study population
We considered early onset of T2DM to reflect the presence more of genetic than of environmental factors. Therefore, we enrolled diabetic patients with early-onset disease as study subjects. One hundred unrelated Japanese T2DM patients, who had developed T2DM before 35 years of age and whose maximum body mass index (BMI) was less than 35 (kg/m 2 ), were the same subjects as in our previous study [32]. They were recruited at Tohoku University Hospital and affiliated hospitals and medical clinics. Diabetes had been diagnosed by applying the WHO criteria. Type 1 diabetes mellitus was excluded based on clinical features and the absence of anti-GAD (glutamic acid decarboxylase) antibodies or anti-IA-2 (insulinoma-associated antigen-2) antibodies. Patients with diabetes mellitus due to hepatic disease, pancreatic disease, other endocrine disorders or mitochondrial DNA mutations, and those with drug-induced diabetes, based on laboratory data and clinical history, were excluded. We also studied 100 non-diabetic control subjects, again the same subjects as in our previous study [32]. These control subjects were enrolled using the following criteria: 60 or more years of age, no prior diagnosis of diabetes mellitus, HbA1c less than 6.4% and absence of a family history of T2DM within third-degree relatives. Applying these criteria allowed us to exclude subjects with a high likelihood of later developing T2DM. HbA1c (%) was estimated as an NGSP (National Glycohemoglobin Standardization Program) equivalent value (%) calculated by the following formula: HbA1c (%) = HbA1c (JDS: Japan Diabetes Society value) (%)+0.4%. The value of HbA1c (JDS) (%) measured based on the previous Japanese standard substance and measurement methods.
Genetic analysis of human subjects was approved by the ethics committee of Tohoku University Graduate School of Medicine. After a detailed description of study participation, written informed consent was obtained for each subject.

Screening methods with the Whole-Genome CNV BeadChip System
As previously reported [32], we screened the whole genome by CNV analysis using the deCODE-Illumina CNV370K BeadChip system (Illumina Infinium System, deCODE Genetics, Inc., Iceland), which, in addition to Hap300 SNP marker contents, has CNV probes designed to target the CNV-rich region of the whole genome. The CNV portion of the platform consists of probes covering CNV rich regions of the genome, such as megasatellites (tandem repeats .500 bp), duplicons (regions flanked by highly homologous segmental duplication .1 kb), unSNPable regions (.15 kb gaps in HapMap SNP map, and 5-15 kb gaps with .2SNPs with Hardy-Weinberg failure), and CNVs registered in the Database of Genomic Variants. The CNV portion of the probe content consists of 15,559 CNV segments covering 190 Mb, or 6% of the human genome. The platform has been tested in 4000 Icelandic and HapMap samples.
Data analysis of the deCODE-Illumina CNV chip was carried out using DosageMiner software developed by deCODE genetics, and loss/gain analysis consisted of the following four steps; (1) intensity normalization and GC content correction, (2) removal of batch effects using principal component analysis, (3) calling of clusters using a Gaussian mixture model, and (4) determination of CNV type using graphical constraints. In brief, CNVs were identified when CNV events stood out in the data, as all sample intensities for CNV probes should be increased or decreased relative to neighboring probes that are not in a CNV region. To determine deviations in signal intensity we started by normalizing the intensities. The normalized intensities for each color channel were determined by an equation and fit formula developed by deCODE genetics. A stretch with occurrence of more than one marker showing an abnormality in the copy number in a consecutive stretch in the genome is considered more likely to be evidence of deletion or gain [22]. High-Density Custom-Made Oligonucleotide Tiling Microarray Analysis DNA samples from early-onset T2DM subjects were subjected to Agilent's high-density custom-made oligonucleotide tiling microarray analysis based on an array comparative genomic hybridization (aCGH) assay as previously reported. We fabricated a custom-designed microarray targeted to a 1.75-Mb genome region in the subtelomere at 16q24.2-3 (Chr. 16: 86,950,000-88,700,000 (NCBI Build 36.1, hg18)), and a 4.8-Mb genome region in the subtelomere at 22q13.31-33 (Chr. 22: 44,750,000-49,550,000 (NCBI Build 36.1, hg18)) according to previously described methods [33,34]. In short, we used the Agilent website (http://earray.chem.agilent.com/earray/) to select and design our custom tiling array; the array consisted of probes 60-mer in size (Agilent Technologies, Santa Clara, CA, USA).
Tiling-aCGH experiments were performed essentially as described previously [35]. In brief, test and reference (NA19000, a Japanese male from the HapMap project) genomic DNAs (250 ng per sample) were fluorescently labeled with Cy5 (test) and Cy3 (reference) with a ULS Labeling Kit (Agilent Technologies).
For each sample, respective labeling reactions were mixed and then separated prior to hybridization for each of the arrays. Labeled test and reference DNAs were combined, denatured, preannealed with Cot-1 DNA (Invitrogen, 1600 Faraday Avenue, Carlsbad, CA, USA) and a blocking agent, and then hybridized to the arrays for 24 hours in a rotating oven at 65uC and 20 rpm (Agilent Technologies). After hybridization and washing, the arrays were scanned at 3 mm resolution with an Agilent G2505C scanner. Images were analyzed with Feature Extraction Software 10.7.3.1 (Agilent Technologies), with the CGH 107 Sep09 protocol for background subtraction and normalization. Abnormal copy numbers, losses, and gains, in a complex multi-copy variable region by high-density tiling array were detected by deviation of probe log2 ratios that exceeded a threshold of 1 SD from the median probe ratio, according to procedures described previously [35][36][37]. We defined two copy number classes, that is, ''unchanged copy number'' and ''copy number loss.'' ''Unchanged copy number'' was defined as the log2 ratio staying within the mean6 1 SD distribution for the normal population. ''Copy number loss'' was considered to be present when the downward-deviation of log2 ratios exceeded a threshold of 1 SD from the median probe ratio.

Statistical analysis
Statistical analyses were performed using JMP and Microsoft excel. The Chi square tests were used to compare characteristics of cases and controls, and the clinical characteristics of T2DM associated with combined copy number loss. Fisher's exact tests were used to examine whether there were differences in the copy number losses between cases and controls.

Results
In searching for other CNVs associated with early-onset T2DM, we screened the whole genome for CNVs using the deCODE-Illumina CNV370K BeadChip system in the same 100 early-onset Japanese T2DM subjects and 100 non-diabetic controls that we reported previously [32]. Characteristics of the diabetic cases and non-diabetic controls are shown in Table 1. To search for other CNVs as genetic variations possibly conferring susceptibility to T2DM, we performed genome-wide CNV analyses in early-onset Japanese T2DM subjects and non-diabetic controls. Early-onset patients had developed T2DM before environmental factors would have had major impacts, suggesting these patients to have stronger genetic factors. Therefore, we selected 100 Japanese T2DM subjects whose onsets had been before 35 years of age. In fact, 94 of our 100 T2DM subjects had a family history of T2DM within third-degree relatives ( Table 1). As non-diabetic controls,  genomes obtained from subjects who were at least 60 years of age and had not developed diabetes were used. In addition to the CNV in 4p16.3 [32], interestingly, those of 16q24.2-3 and 22q13.31-33 were found to be located on the subtelomeric regions in each chromosome. At the CNV on 16q24.2-3, 15 of the 100 T2DM subjects displayed copy number loss around this CNV marker, compared with only 3 of the 100 non-diabetic control samples. At the CNV on 22q13.31-33, 22 of the 100 T2DM subjects displayed copy number loss around this CNV marker, compared with only 6 of the 100 control samples. Hereafter, we focused on the two subtelomeric CNVs on 16q24.2-3 and 22q13.31-33. Figure 1 shows the patterns of alterations in copy number loss observed in the 100 early-onset Japanese T2DM subjects and nondiabetic controls. Among diabetic subjects, 15 showed copy number losses in the 16q24.2-3 subtelomeric region, whereas only 3 of the 100 non-diabetic control samples had copy number losses in this region (P = 5.1961023, OR = 5.7, 95% confidence interval 1.6-20.4) ( Figure 1A). For the 22q13.31-33 subtelomeric regions, 22 diabetic subjects showed copy number losses around the gap region, whereas only 6 of the 100 non-diabetic control samples had copy number losses in this region (P = 1.8161023, OR = 4.4, 95% confidence interval 1.7-11.4) ( Figure 1B). We observed two copy number patterns, that is, ''unchanged copy number'' and ''copy number loss'', at these two regions in our diabetic and control populations, while none of our subjects had detectable ''copy number gain''. Copy number loss was frequently observed at the 16q24.2-3 and 22q13.31-33 subtelomeric regions in earlyonset T2DM subjects.
To verify the CNV BeadChip results, we analyzed copy number changes along these two regions in the subtelomere using a highdensity custom-made oligonucleotide tiling microarray. Structural details of the copy number losses in each of four representative T2DM patients with copy number losses in 16q24.2-3 and 22q13.31-33 subtelomeric regions are shown in Figures 2A and  2B, respectively. Individual copy number plots using moving average (y-axis) versus distance along the chromosome (x-axis) are presented. For comparison, copy number plots of healthy individuals who did not exhibit copy number alterations in the region are also shown ((e) and (f) of Figures 2A and 2B). Using this procedure, we examined copy number losses in the 16q24.2-3 and 22q13.31-33 subtelomeric regions in 10 of 15 and 12 of 22 subjects, respectively, in whom copy number losses in each region had been detected by means of the first genome-wide screening using the deCODE-Illumina CNV370K BeadChip system. Copy number losses were confirmed in all 22 subjects examined ( Figures 3A and 3B). In addition, the high-density tiling microarray further verified these copy number losses to actually be segmental losses. Thus, the first screening results were confirmed by the more detailed procedure.
As we carried out these analyses, we recognized that the CNVs were often detected in the same subjects. For instance, all 10 subjects who were confirmed to have the copy number loss in 16q24.2-3 using the high-density tiling microarray analysis simultaneously had the copy number loss in 22q13.31-33. Therefore, we calculated the frequencies of these two CNVs as well as the one on 4p16.3 [32] in all 100 diabetic and 100 nondiabetic subjects, according to the results obtained using the deCODE-Illumina CNV BeadChip system. As shown in Figure 4A, all diabetic subjects with copy number loss on 4p16.3 or 16q24.2-3 had another CNV(s). In addition, 15 of the 22 diabetics with copy number loss on 22q13.31-33 simultaneously had another CNV(s). Intriguingly, 11 of the 100 T2DM subjects had copy number losses in all three subtelomeric regions (Figures 4  A), while none of the 100 controls had copy number losses in all three regions ( Figure 4B) (P = 7.00610 24 by Fisher's exact test). In contrast, of the study subjects with copy number losses in two of the three subtelomeric regions, 5 were diabetic (Figures 4 A), while 2 were controls ( Figure 4B), with the difference not reaching statistical significance (P = 4.45610 21 ). Thus, simultaneous copy number losses in these three different chromosomes are clearly associated with T2DM development and may become a novel DNA marker, usuful at least for detecting subjects likely to have early-onset T2DM.
We next examined the clinical characteristics of T2DM associated with combined copy number losses. Sex and onset age did not differ between patients with copy number losses in all three subtelomeric regions (n = 11) and those with no copy number losses in all three regions (n = 89). All 11 patients with copy number losses in all three subtelomeric regions had a family history of T2DM within third-degree relatives, supporting the notion of combined copy number losses being a genetic factor conferring high susceptibility to T2DM development. However, maximal body mass index, HbA1c and postprandial plasma glucose levels did not differ between these two groups ( Table 2). In addition, urinary C-peptide reactivity levels and ratios of receiving insulin therapy were similar ( Table 2), suggesting that patients with copy number losses in all three regions do not have specific clinical features. Therefore, it is possible that these combined copy number losses would often be detectable in early-onset T2DM patients. Thus, detection of these copy number losses may be a novel and useful diagnostic measure for predicting the early onset of T2DM.

Discussion
This study revealed two novel copy number losses in the subtelomeric regions in 16q24.2-3 and 22q13.31-33. Our initial genome-wide screening with the deCODE-Illumina CNV370K BeadChip system for association with early-onset T2DM revealed novel losses in the subtelomeric regions of both 16q24.2-3 and 22q13.31-33. Subsequent high-density custom-made oligonucleotide tiling microarray analysis verified copy number losses in these regions. Furthermore, taken together with the results of our previous study [32], simultaneous copy number losses in three (4p16.3, 16q24.2-3 and 22q13.31-33) subtelomeric regions were detected in 11 of 100 T2DM patients, while none of the 100 controls had copy number losses in all three subtelomeric regions (Figures 4). This result suggests that simultaneous copy number losses in these three subtelomeric regions constitute a potentially novel and useful strategy, with high accuracy and selectivity, for predicting susceptibility to early-onset T2DM.
The CNV regions found in 4p16.3, 16q24.2-3 and 22q13.31-33 contain a number of genes ( Figure 3) which might be involved in T2DM development. For example, genes involved in the glucoseinduced insulin secretion cascade of pancreatic beta-cells, such as ATP5I (ATP synthase, H+ transporting, mitochondrial F0 complex, subunit E) [38], CPLX1 (complexin 1) [20] and GAK (cyclin G associated kinase) in association with CDK5 (cyclindependent protein kinases 5), are located in 4p16.3 [39,40]. PPARa (peroxisome proliferator-activated receptor alpha) located in 22q13.31-33 is reportedly involved in T2DM onset and progression [32,41]. Therefore, there is a possibility that these and other as yet unknown genes are responsible for T2DM pathogenesis through their impairments caused by copy number losses. However, these genes were not necessarily all deleted in the 11 patients who had copy number losses in all three subtelomeric regions ( Figure 2). In addition, we also discovered marked accumulation of CNVs in the same subjects with T2DM, as compared with non-diabetic controls. In particular, 11 of our T2DM subjects harbored all three subtelomeric copy number losses, whereas none of the controls did. These results suggest that copy number losses are not a direct cause of T2DM development. Instead, copy number losses per se appear to have been induced by another major mechanism(s) which is also involved in T2DM development. Therefore, a putative cause for CNVs, rather than a single gene defect in the CNV regions, is involved in the onset and progression of T2DM.
As shown in Table 1, the patients with all three copy number losses did not exhibit specific clinical characteristics of T2DM. Body weights, insulin secretion and insulin therapy ratios did not differ from those of other early-onset T2DM patients. Therefore, as yet, we cannot distinguish these patients with CNVs based on their clinical findings. In other words, these patients may be identified among the entire population of early-onset T2DM patients, and possibly even in the late-onset T2DM population as well. More studies are needed to elucidate the rate of CNVs in T2DM subjects, in general, by examining these CNVs in larger populations. Since DNA samples were collected from individuals living in northeastern Japan, local factors should be taken into consideration. However, there are no apparent locality differences between our T2DM subjects and the non-diabetic controls. In addition, neither common nor other CNVs tended to be shared between the patient and control groups (data not shown). These results collectively indicate the combined CNVs to be strongly associated with T2DM, at least in subjects living in northeastern Japan. Considering that genomes uniquely correlating with T2DM have been found in East Asian countries like Japan, Korea and China [30], the CNVs in these three regions might be related to racial inherited factors. In addition, because of the small sample size, we cannot rule out the possibility that the contribution of CNVs to T2DM development might be overestimated. Further intensive studies with a larger population are needed to clarify these issues, including generality, local factors and pathogenesis.
In conclusion, copy number losses within multiple subtelomeric regions are strongly associated with early-onset T2DM. Examining the CNVs in three regions may be a useful strategy for predicting T2DM as well as possibly providing insights into its pathogenesis.