Sequencing and association analysis of the type 1 diabetes – linked region on chromosome 10p12-q11

Background In an effort to locate susceptibility genes for type 1 diabetes (T1D) several genome-wide linkage scans have been undertaken. A chromosomal region designated IDDM10 retained genome-wide significance in a combined analysis of the main linkage scans. Here, we studied sequence polymorphisms in 23 Mb on chromosome 10p12-q11, including the putative IDDM10 region, to identify genes associated with T1D. Results Initially, we resequenced the functional candidate genes, CREM and SDF1, located in this region, genotyped 13 tag single nucleotide polymorphisms (SNPs) and found no association with T1D. We then undertook analysis of the whole 23 Mb region. We constructed and sequenced a contig tile path from two bacterial artificial clone libraries. By comparison with a clone library from an unrelated person used in the Human Genome Project, we identified 12,058 SNPs. We genotyped 303 SNPs and 25 polymorphic microsatellite markers in 765 multiplex T1D families and followed up 22 associated polymorphisms in up to 2,857 families. We found nominal evidence of association in six loci (P = 0.05 – 0.0026), located near the PAPD1 gene. Therefore, we resequenced 38.8 kb in this region, found 147 SNPs and genotyped 84 of them in the T1D families. We also tested 13 polymorphisms in the PAPD1 gene and in five other loci in 1,612 T1D patients and 1,828 controls from the UK. Overall, only the D10S193 microsatellite marker located 28 kb downstream of PAPD1 showed nominal evidence of association in both T1D families and in the case-control sample (P = 0.037 and 0.03, respectively). Conclusion We conclude that polymorphisms in the CREM and SDF1 genes have no major effect on T1D. The weak T1D association that we detected in the association scan near the PAPD1 gene may be either false or due to a small genuine effect, and cannot explain linkage at the IDDM10 region.


Conclusion:
We conclude that polymorphisms in the CREM and SDF1 genes have no major effect on T1D. The weak T1D association that we detected in the association scan near the PAPD1 gene may be either false or due to a small genuine effect, and cannot explain linkage at the IDDM10 region.

Background
Type 1 diabetes (T1D) is the second most common chronic disease in children. It develops as a result of a complex interaction of genetic and environmental factors leading to the immune-mediated destruction of the insulin-producing pancreatic β-cells. Genetic predisposition has a significant role in T1D as suggested by familial clustering of the disease and increased concordance among monozygotic twins [1]. The identification and localization of susceptibility genes for complex traits or common diseases has made slow progress, owing to many factors including small effect sizes, incomplete knowledge of the polymorphism content of the genome and its patterns of linkage disequilibrium and lack of inexpensive genotyping technologies. One approach to narrowing down to specific genome regions that might contain a susceptibility gene or genes has been to carry out linkage studies in affected sib-pair families. In contrast to monogenic diseases, this approach has had limited success in multifactorial diseases.
Nevertheless, in T1D, combined analyses of several studies provided evidence for four linked regions, the major locus MHC on 6p21 (previously designated IDDM1), 10p14-q11 (IDDM10), 2q31-q33 (IDDM7 and IDDM12) and 16q22-q24 [2]. Here we undertook analysis of sequence polymorphisms in the putative IDDM10 region, comprising 23 Mb region on the chromosome 10p12-q11. We have identified a large number of single nucleotide polymorphisms (SNPs) and performed an association scan of this region in a large collection of T1D families, as well as unrelated patients and controls.

Results
Initially, in order to identify T1D genes in the IDDM10 region we adopted a candidate gene approach. Previously we examined the GAD2 gene, which encodes a major T1D autoantigen GAD65 protein, and found no evidence of association [3]. Here we resequenced and studied association of two candidate genes, CREM and SDF1, which also map to this region. The cyclic adenosine 5'-monophosphate responsive element modulator (CREM) has been shown to bind to the Interleukin-2 gene promoter and suppress expression of this cytokine [4], which is critical for the initiation and termination of the immune response as well as for T cell development. Increased CREM expression was found in T cells of patients with another autoimmune disease, systemic lupus erythematosus [5]. By resequencing the CREM gene we identified 32 SNPs, including 13 novel SNPs (Supplementary Table 1, see Additional file 1). We selected six tag SNPs and genotyped them in 1,612 T1D patients and 1,828 controls from the UK. We found a multi locus P = 0.98, indicating that common variants of CREM do not affect T1D susceptibility in a major way.
By resequencing the cytokine stromal cell-derived factor 1 (SDF1 or CXCL12) gene we identified 33 variants, including two insertion/deletion polymorphisms, 21 of which were novel (Supplementary Table 2, see Additional file 2). We selected six tag SNPs, genotyped them in 1,612 cases and 1,828 controls and found no association (multi locus P = 0.67). We also tested SNP rs1801157, also known as 3'A(801G>A), in the evolutionary conserved 3' untranslated segment of SDF1 that previously had been associated with early onset of T1D [6,7]. We attempted to replicate these findings and genotyped rs1801157 in 1,800 T1D families from the UK, USA and Norway. The A allele and AA genotype frequencies were very similar to those reported previously (19.4% and 5.8%, respectively). The transmission disequilibrium test revealed no association with T1D (507 transmitted A alleles and 530 untransmitted, P = 0.47; relative risk for AA genotype = 1.01, 95% CI = 0.87-1.16, P =0.89). Even though we obtained no evidence of association we subdivided the families by ageat-onset and by HLA-DRB1 genotype because the two previous studies had carried out subgroup analyses. However, we found no association in any subgroup (data not shown).
We then conducted a comprehensive genetic analysis of the whole IDDM10 region in order to systematically identify new T1D gene(s). As part of the Human Genome Project (HGP) the Wellcome Trust Sanger Institute constructed a single tile path, i.e. set of overlapping BACs derived from two different libraries [8,9]. The overlaps between clones in this tile path were checked for SNPs and those were deposited in the dbSNP database previously [10]. In order to discover additional novel SNPs in the IDDM10 region we constructed a second tile path that covers the whole region using clones from both BAC libraries, so that finished genome sequence from one library was complemented by a clone from the second library, i.e. from a different individual. This second tile path was then shotgun sequenced. Thus, we revealed additional polymorphic sites located outside BAC overlaps in the initial HGP tile path. In total we identified and submitted to dbSNP 12,058 SNPs, of which 10,808 were uniquely mapped onto the human genome build 34, including 1,320 SNPs that were novel, i.e. not present in dbSNP build 120. These SNPs contributed substantially to the polymorphism content of the IDDM10 region.
We then screened for association with T1D sequence polymorphisms between 21.0 Mb and 44.3 Mb of chromosome 10 (NCBI genome build 34) that include the IDDM10 region. In total 303 SNPs and 25 polymorphic microsatellite markers/short tandem repeats (STRs) were genotyped in up to 765 families with two affected offspring (Supplementary Table 3, see Additional file 3). This sample includes families in which linkage of IDDM10 was characterized initially [11][12][13]. We found 14 polymorphisms in nine loci showing nominal evidence of association with T1D (P < 0.05). In order to investigate these results further, we genotyped these polymorphisms in an additional set of T1D families (Table 1). In the combined analyses of up to 2,857 families we found some evidence for T1D association of D10S193, rs1963187 and rs2480285 (P = 0.037, 0.0074 and 0.0026, respectively), which are clustered in a 97 kb region (coordinates: chr10; 30,577,375..30,674,697; NCBI genome build 34). Their association with the disease was largely independent of each other (between rs1963187 and rs2480285 r 2 = 0.24, while between risk associated allele 226 of D10S193 and rs1963187 and rs2480285 r 2 = 0 and 0.01, respectively).
Two genes localize within 150 kb of the D10S193, rs1963187 and rs2480285 markers ( Figure 1 and [14]). The polyA polymerase associated domain containing 1 (PAPD1) gene encodes a protein with a nucleic acid binding PAP/25A-associated domain. Associated polymorphisms flank PAPD1, while the second gene, known as MAP3K8 (mitogen-activated protein kinase kinase kinase 8) is located 50 -177 kb away from the associated polymorphisms. We then searched for novel sequence polymorphisms in this region. Using a panel comprising eight Caucasian individuals we resequenced 38.8 kb in the 177.2 kb region between D10S193 and exon 9 of the Then, in addition to the five polymorphisms (two STRs and three SNPs) in the PAPD1-MAP3K8 region that were already genotyped, we tested 84 SNPs identified by resequencing. At first we studied association in 458 UK families (Supplementary Table 5, see Additional file 5). Subsequently, seven SNPs that were suggestively associated in these UK families (P = 0.012 -0.073) were genotyped in the extended set of up to 2,857 T1D families. Thus, overall we studied 12 polymorphisms in the PAPD1-MAP3K8 region in all available T1D families. We found that alleles of six SNPs that localize in the PAPD1 gene show nominal evidence of association with T1D risk or protection (P = 0.0026 -0.031, Table 1). Then we genotyped an additional sample of 1,693 unrelated T1D patients and 1,805 controls from the UK ( Table 2) for six PAPD1 polymorphisms that were associated in the previous analysis of the extended family set. We found that only microsatellite marker D10S193 located 28 kb downstream of PAPD1 was weakly associated in this sample (P = 0.03, Table 2). Thus allele 226 of D10S193 was weakly associated with T1D risk both in the families (relative risk [RR] = 1.15, P = 0.019) and in the case-control analysis (OR = 1.16, P = 0.078). Another D10S193 allele 228 was associated with protection from T1D in cases and controls (OR = 0.73, P = 0.006), but not in the families (RR = 0.96, P = 0.59).
Additionally, we further studied seven SNPs in the MYO3A, HRNPF, NRP1 and SVIL gene regions that have shown some evidence of association in the T1D families (P = 0.0092 -0.04, Table 1). We genotyped these SNPs in 1,693 T1D patients and 1,805 controls, but found no association with T1D (Table 2).

Discussion
Overall, the association signals that we detected in the IDDM10 region near the PAPD1 gene did not reach genome-wide significance levels, despite having tested large samples of T1D families, cases and controls. This association could be spurious or could indicate a small genuine effect located near the PAPD1 gene that we did not have statistical power to demonstrate at a genomewide significance level. Such an effect could not explain the reported evidence of T1D linkage at this region of chromosome 10, λs = 1.12. If IDDM10 is a true disease locus, it could be caused by a single common contributory variant with strong effect (such as OR = 2 and minor allele frequency of 0.2) or, more likely, by a number of variants with smaller effects located in this region. Therefore, further association studies in datasets that are powered to identify weak genetic effects (e.g. OR = 1.3 -1.5) are needed to discover these type 1 diabetes genes. The PAPD1-MAP3K8 gene region on chromosome 10

Conclusion
We identified a large number of SNPs for genetic studies in the IDDM10 region using a novel sequencing strategy, performed a first T1D association scan of this region and eliminated the possibility that two functional candidate genes, CREM and SDF1, have major effect on T1D. The weak association signal near the PAPD1 gene detected in the association scan may be either false or due to a small genuine effect, and cannot explain the previously observed strong linkage in the IDDM10 region.

SNP identification
Repeats were masked using RepeatMasker [15] and then masked sequences were used in pair-wise sequence alignments by Sequence Search and Alignment by Hashing Algorithm (SSAHA) to map clone sequences and find SNPs [16]. Overlaps ≥ 2 kb were considered for SNP identification. We only checked overlaps between clones in different tile paths as overlaps from the finished tile path had been checked for SNPs previously. A file of overlap pairs was derived and used as an input for a script that calls SSAHA and then parses the resulting alignments. To avoid false SNP calling due to misalignment, clusters of five or more SNPs were rejected when each one was less then 10 bp away from neighboring SNPs. We then mapped SNPs on the human genome consensus path (NCBI build 34) using the mapping information from the clone and the SSAHA algorithm. The tile paths and SNPs can be viewed on our website [14]. Information on all polymorphisms has been submitted to dbSNP [17].

Subjects
The study was done according to the principles of the Helsinki Declaration. We obtained permission from relevant ethical committees and informed consent from the participating subjects. Initially, we genotyped 329 polymorphisms in up to 765 families of Caucasian ethnic group, each with two children affected with T1D comprising 458 Diabetes UK Warren families and 307 The Human Biological Data Interchange (HBDI) families. An extended set of families with at least one affected child comprised families from the UK (n = 1,781, including 458 Diabetes UK Warren families), Norway (n = 359), Romania (n = 352) and the USA (n = 365, including 307 HBDI families). Therefore, in total we studied 2,857 T1D families; exact number of affected offspring genotyped for each SNP is shown in Table 1. Assuming multiplicative genetic model, this extended set of T1D families provides over 80% power to detect genetic effect with odds ratios (OR) of 1.25 and 1.5 for alleles at 40% and 7%, respectively, at α = 10 -6 . Additionally we studied an independent sample of 1,693 T1D patients collected across the UK and 1,828 control subjects that were selected from the 1958 British Birth cohort [18]. This case-control collection would have over 99% power to detect OR = 1.25 and 1.5 for alleles at 40% and 7%, respectively, at α = 0.05.

Candidate gene resequencing and genotyping
We designed primers using Primer3 [19], amplified genomic DNA by PCR and sequenced PCR fragments with an ABI Big Dye Terminator v3.1 kit and an ABI3700 capillary sequencer (ABI, Foster City, CA). Sequence reads were aligned using the Staden package [20]. We resequenced in 32 individuals all CREM and SDF1 exons, exon-intron boundaries, and up to 3 kb upstream and downstream of the gene. All identified SNPs have been submitted to dbSNP [17]. SNP genotyping was carried out using Invader (Third Wave Technologies, Madison, WI), Taq-Man (Perkin Elmer Applied Biosystems, Foster City, CA) or BeadArray (Illumina Inc, San Diego, CA). Microsatellite markers were genotyped as described elsewhere [13].

Statistical analysis
We assessed genotype frequency among parents for each polymorphism using Arlequin version 2.000 [21] and found no unexpected deviation from the Hardy-Weinberg equilibrium (P > 0.01). Statistical analysis was carried out within STATA version 8.1 [22]. Tag SNPs that capture common allelic variation (MAF > 0.03) with r 2 ≥ 0.8 were selected using htstep, htsearch and haptag programs within Stata [23,24]. When a tag SNP approach was taken, we used a global association multilocus test using mlpop program in Stata. It tests for association between the disease and the tag SNPs due to linkage disequilibrium with one or more causal variants in the region. This test contrasts the allele frequencies of a non-redundant set of tag SNPs between cases and controls by use of Hotelling's T 2 test [25,26]. We did not apply multiple testing corrections in this study and all P-values reported are uncorrected.