Introduction

Type 2 diabetes is an outcome of chronic hyperglycaemia and is clinically characterised by impaired insulin secretion and insulin resistance [1, 2]. With 51 million individuals with type 2 diabetes [3], India could easily become an epicentre for research into the detection of common biological mechanisms through the genomics of diabetes and its related quantitative traits and their pathological phenotypic extensions. Several genome-wide association studies (GWASs) have confirmed associations of at least 38 common genetic variations with type 2 diabetes and related intermediate traits and many more are anticipated with growing sample sizes and better array performance [46]. In the light of the above discoveries, it is interesting to investigate whether the top GWAS signals in European populations are also associated with diabetes in populations of different ancestry, such as South Asians [7]. Therefore, validation of single nucleotide polymorphisms (SNPs) from different loci identified through GWASs is now an important step towards genomic medicine.

In India most genetic studies of type 2 diabetes published to date have used hospital-based case–control designs for either establishing candidate genes in the metabolic pathways of interest [8, 9] or validating GWAS findings [1013]. Further, the studies on quantitative traits of type 2 diabetes in India are few [14, 15] and have investigated a limited number of SNPs. For instance, Chauhan et al. [15] have recently validated two loci (TCF7L2 and PPARG) associated with HOMA-beta cell function (HOMA-β) from eight loci tested in 2,500 control participants from the Indo-European linguistic group of India.

A high level of population stratification and ecological diversity has complicated genetic epidemiological studies in India. It has been estimated that the population differentiation between two neighbouring populations in India may be as large as that between northern and southern Europeans (average fixation index [F ST] value of 0.0109 compared with F ST = 0.003 for northern and southern Europeans) [16]. Moreover, the endogamous structure of the Indian population has maintained a high level of linkage disequilibrium (LD) because of ancestral founder effects [16]. It has been suggested that genetic predisposition to diabetes may differ in Indians compared with Europeans [15]; however, in view of the endogamous structure and the high possibility of population stratification, these results need to be interpreted with caution.

As very few studies in India have tested the association of type 2 diabetes SNPs with related quantitative traits, we conducted the present study to evaluate the relationships between 31 GWAS-confirmed type 2 diabetes SNPs from 24 genes with type 2 diabetes and four available intermediate traits: fasting glucose; fasting insulin; HOMA-insulin resistance (HOMA-IR); and HOMA-β. We used a population-based cohort of 3,089 sib pairs (2,528 for quantitative traits and 561 for type 2 diabetes) drawn from four cities across India, a design resistant to population stratification as comparisons are made within sib pairs.

Methods

Study population

The study included individuals recruited to the population-based Indian Migration Study (IMS) conducted during 2005–2007 in factories located in four cities (Lucknow, Nagpur, Hyderabad and Bangalore) of India. Briefly, each migrant worker and spouse were asked to invite one non-migrant full sibling of the same sex and closest to them in age still residing in their rural place of origin to take part in the study. This strategy resulted in rural-dwelling sibs being drawn from 20 of the 28 states in India, reflecting the migration patterns of the factory workforce and their spouses (for details see the Electronic supplementary material [ESM text]). Phenotypic information was available for 7,067 participants, of whom 6,780 individuals were true sib pairs. All the information was collected after receiving informed consent from the study participants. Ethical approval was obtained from the ethics committee of the All India Institute of Medical Sciences (AIIMS), New Delhi, India (reference number A-60/4/8/2004).

Biochemical phenotypes

After the separation of plasma and serum, samples were transported monthly to AIIMS for the biochemical assays. Fasting plasma glucose was measured on the day of blood collection by local laboratories at each site using the GOD-PAP method and RANDOX kits (Randox Laboratories, Crumlin, UK) [17]. Fasting insulin was assayed in serum samples by the ELISA method, as a solid-phase two-site enzyme immunoassay, using kits from MERCODIA (Mercodia AB, Sylveniusgatan, Uppsala, Sweden) [18]. The quality of local assays was monitored by AIIMS, and checked regularly with external standards and internal duplicate assays. For quality assurance purposes, the Cardiac Biochemistry Laboratory, AIIMS, is part of the UK National External Quality Assessment (www.ukneqas.org.uk/). HOMA-IR and HOMA-β scores were calculated using standard equations [19]. Participants with fasting glucose ≥7 mmol/l or already diagnosed with diabetes (with or without medication) were classified as type 2 diabetes.

Genotyping and quality control

Genomic DNA was isolated from all samples using the salt precipitation method and DNA samples were plated in 96-deep-well storage plates at a uniform concentration of 10 ng/λ at the Centre for Cellular and Molecular Biology, Hyderabad, India. Each plate included eight repeat samples (∼10%) as a quality control measure. We used the Sequenom Mass ARRAY technology to genotype 31 type-2-diabetes-related SNPs from 24 loci, which were discovered in various GWASs [2023] as part of a common multiplex pool. The study was proposed in 2007 and started in 2008; therefore the SNPs included in the genotyping panel are part of GWAS results published up to that time. We also included some SNPs with p values >1 × 10−7 to validate their associations in the Indian population. The genotyping success rate was >95% and results of duplicate samples had >97% concordance. The final analysis for quantitative traits was done on 2,528 sib pairs (5,056 individuals) because 2,011 of the 7,067 individuals with phenotypic information were excluded from the analysis (i.e. 301 pairs [602 individuals] had genotyping data for fewer than 50% of the 31 genotyped SNPs; 136 pairs [272 individuals] were not true siblings; 15 individuals were singletons; and 561 pairs [1,122 individuals] were concordant/discordant for type 2 diabetes). Type 2 diabetes as a binary trait was investigated using data from 561 pairs (1,122 individuals).

Sample size and power calculation

Given a sample size of 2,528 sib pairs and minor allele frequency (MAF) = 9–48%, our study had ≥80% power to detect a quantitative trait locus explaining 0.7% of genetic variation at α = 0.05 and 1.5% of genetic variation at α = 0.001, for each of the tested traits. This was plausible in the context of genetic effect estimates observed for quantitative traits in European populations, which range from r 2 < 0.001 to r 2 = 0.013 (β range 0.008 to −0.28, ESM Table 1). For type 2 diabetes as a binary outcome, based on the 561 sib pairs (487 discordant and 74 concordant pairs), we had 20% power to detect an OR of 1.17 (the average European effect size) given MAF of 23% (average MAF in this study) (ESM Tables 2 and 3). The above calculations were performed using the Genetic Power Calculator (http://ibgwww.colorado.edu/∼pshaun/gpc/; option of ‘QTL association for sibships’) [24] for quantitative traits, and QUANTO 1.1 (http://hydra.usc.edu/gxe) for type 2 diabetes as a binary trait (using the case–sibling option).

Statistical analyses

Genotype frequencies were calculated in unrelated individuals residing in an urban location (a single randomly chosen member from each family) (n = 2,148) and tested for departure from Hardy–Weinberg equilibrium using the exact test implemented in PLINK (version 0.99p; http://pngu.mgh.harvard.edu/∼purcell/PLINK) [25].

Association analysis was carried out using the orthogonal family-based model of Fulker et al. [26]. This is a mixed regression model in which the genetic effect is decomposed into fixed between- and within-family effects, with inference performed on the within-family effect. Shared environmental and residual genetic effects within families are modelled as a random effect. We fitted models in STATA (version 10) as the QTDT software [27], widely used for the Fulker model, is limited to analysis of quantitative traits. For drawing inferences, we have used the results after adjustment for BMI because adjustment with this covariate has reduced the residual variance in the regression model and provided more significant findings in comparison with the unadjusted model. In the ESM text we describe additional analyses of population structure and corresponding adjustments to association tests. Although the corrected overall critical p value using a false-discovery rate for each trait is equal to 0.002 [28], we have drawn inferences on the basis of uncorrected p values because there was strong prior evidence of associations of all the examined SNPs with type 2 diabetes [2023].

Assuming additive models, we restricted the analysis to full sib pairs (n = 2,528 pairs) after removing the pairs concordant/discordant for type 2 diabetes (n = 561 pairs); for each SNP the major allele was the reference, and an effect estimate was calculated per copy of the minor allele. Associations between SNPs and the natural log transformed z scores of quantitative phenotypes were tested after including age, sex and location (urban or rural) as covariates. We also tested separately for interaction between- and within-families genetic effect and age, sex and location (rural/urban) as effect modifiers in the models for quantitative traits.

For association analysis with the discrete trait of type 2 diabetes, we used a mixed logistic regression model using the same Fulker’s decomposition of genotypes on 561 sib pairs (see ESM Table 4).

Results

Estimates of MAFs for 29 of the tested SNPs were available from the HapMap-GIH (Gujarati Indians in Houston) population (http://hapmap.ncbi.nlm.nih.gov) and were found to be similar to those in the study sample (ESM Table 2). The summary characteristics of the study participants are given in Table 1 (see also ESM Table 4). As age, sex and location (urban/rural) were associated with type 2 diabetes [29], these covariates were included in further association models.

Table 1 Characteristics of participants in the Indian Migration Study

We found that 7 of the 31 SNPs were associated with at least one of the four continuous traits related to type 2 diabetes after adjusting for BMI (Table 2; see also ESM Table 5 for unadjusted results).

Table 2 Within-sib-pair association estimates for quantitative traits related to type 2 diabetes (adjusted for BMI)

Association analysis with fasting insulin and HOMA-IR

Using within family-based approach, we found a variant of ADAM30 (ADAM metallopeptidase domain 30, rs2641348, G allele) was associated with lower levels of fasting insulin (β = −0.08, 95% CI −0.02, −0.02, p = 0.009) and HOMA-IR (β = −0.07, 95% CI −0.13, −0.006, p = 0.03). Similarly, we also found an association of the CDKN2A/B locus (rs10811661, C allele) with lower levels of fasting insulin (β = −0.09, 95% CI −0.16, −0.02, p = 0.02) and HOMA-IR (β = −0.09, 95% CI −0.16, −0.02, p = 0.02). Further, the T allele at rs10923931 variant in Notch homologue 2 (NOTCH2) showed a decreased level of fasting insulin (β = −0.06, 95% CI −0.12, −0.001, p = 0.04), whereas, TCF-2 (also known as HNF1B) (rs757210, A allele) was associated with higher levels of fasting insulin (β = 0.05, 95% CI −0.001, 0.11, p = 0.05) and HOMA-IR (β = 0.05, 95% CI 0.003, 0.11, p = 0.04). We observed a strong LD (r 2 > 0.80; range 0.81–0.92) between ADAM30 and NOTCH2 in the study population (ESM Table 6).

Association analysis with fasting glucose and HOMA-β

We observed an association of the variant of the CDKAL1 locus (rs7756992, G allele) with an increased level of fasting glucose (β = 0.009, 95% CI 0.002, 0.02, p = 0.01). In addition, variants at the T allele at TCF7L2 (transcription factor 7 like 2), rs7903146 (β = 0.007, 95% CI −0.0001, 0.01, p = 0.05) and rs12255372 (β = 0.01, 95% CI 0.004, 0.02, p = 0.003) predicted higher fasting glucose levels. In contrast, the CXCR4 locus (rs932206, T allele) was associated with a lower level of fasting glucose (β = 0.009, 95% CI −0.02, −0.0005, p = 0.04), while we found association of ADAM30 (rs2641348, G allele) (β = −0.05, 95% CI −0.10, −0.01, p = 0.01) and the CDKN2A/B locus (rs10811661, C allele) (β = −0.05, 95% CI −0.11, −0.004, p = 0.03) with reduced beta cell function.

Association analysis with type 2 diabetes

In the present study, of the 31 SNPs tested for association with type 2 diabetes (ESM Table 2), only a variant in the THADA locus (rs7578597) was found to be associated with OR 1.5 (95% CI 1.04, 2.22, p = 0.03) and this association remained significant even after adjustment for BMI. The remaining SNPs in this study had estimated ORs of 1.1 or less (ESM Table 2) and we had insufficient power (because of the population-based design) to detect such a low effect size.

After applying corrections for 31 SNPs (p < 0.002 using the false discovery rate), we did not find any significant association with the studied phenotypes (fasting glucose, fasting insulin, HOMA-IR, HOMA-β). For comparison, we performed an additional analysis using Fulker’s between-sib-pair model for total association with adjustment for a limited number of population clusters identified by STRUCTURE [30] (see ESM Fig. 1), but these results did not differ appreciably from the within-family association.

Discussion

We report associations of ADAM30 and CDKN2A/B with fasting insulin levels, HOMA-IR and HOMA-β in India. Further, we report the associations of TCF-2 and NOTCH2 with fasting insulin level, and TCF-2 with HOMA-IR. We also detected that CXCR4, CDKAL1 and TCF7L2 variants influence fasting glucose levels in the study population. In addition, we found the association of THADA with type 2 diabetes in the Indian population. The consistency of detecting association of these regions in the Indian population shows the importance of GWAS signals in quantifying the effects of type 2 diabetes.

In order to understand the mechanisms involved in beta cell function and glucose homeostasis, and to complement the genetic analysis of type 2 diabetes, large-scale meta-analysis of GWASs on continuous glycaemic phenotypes has been performed in the last 2 years [4, 5, 31]. These studies have found new genetic variants associated with quantitative metabolic phenotypes and have increased our understanding of the pathophysiology of type 2 diabetes. In addition, several studies have also shown positive associations of genetic variants related to type 2 diabetes with the continuous traits of this disease [3234]. In India, few type 2 diabetes SNPs have been evaluated with quantitative phenotypes, so the relevance of these SNPs for glycaemic traits and their relation to the pathogenesis of type 2 diabetes is unclear for this population.

Of the 31 SNPs tested for association, we found that the minor alleles of ADAM30 and CDKN2A/B variants predict reduced fasting insulin, HOMA-IR and HOMA-β, suggesting their role in insulin sensitivity and beta cell dysfunction. Several studies did not find any association of ADAM30 with quantitative traits related to type 2 diabetes [3335], whereas for CDKN2A/B few studies have shown evidence of impaired [36] and improved [37] beta cell function. Similar to European population studies, we also observed a strong LD between ADAM30 and NOTCH2 [23], and an association of NOTCH2 with lower levels of fasting insulin [5]. Importantly, the variant, rs2641348 in ADAM30 is a non-synonymous SNP (L359P) and could be directly evaluated for functional genomic studies to determine the mechanism regulating insulin secretion and resistance [23]. Chauhan et al. [15] studied CDKN2A/B in an Indian population but did not observe an association with type-2-diabetes-related intermediate traits. Only one small study from India has shown an association of CDC123/CAMK1D with fasting insulin levels in a Sikh population, but was unable to find any evidence for CDKN2A/B and NOTCH2, probably because of the low power of the study [14]. CDKN2A/B has a role in pancreatic islets [38], and possibly influences insulin secretion by decreasing beta cell mass [36].

Gudmundsson et al. [39] observed the protective role of TCF-2 variant against type 2 diabetes among Europeans and confirmed their findings in African and Chinese populations; however, these observations could not be replicated in an independent study in a Han Chinese population [40, 41]. Notably, we have reported the association of TCF-2 with higher levels of fasting insulin and HOMA-IR, suggesting a role in insulin resistance.

CDKAL1 plays an important role in insulin secretion from pancreatic beta cells [42]. In a recent large meta-analysis of GWAS studies in non-diabetic individuals from European populations, CDKAL1 was not associated with any of the quantitative traits [5]. Chauhan et al. failed to find any association of CDKAL1 with glycaemic traits in Indians [15]. However, we found an association of the CDKAL1 locus with increasing levels of fasting glucose, as observed by Voight et al. [4]. Further, Cauchi et al. [43] found an association of CXCR4 with type 2 diabetes in a French population, and in the present study, we found an association of the CXCR4 locus with increased levels of fasting glucose, suggesting its role in glucose homeostasis.

We also replicated in our population an association of TCF7L2 variants with high fasting glucose as reported by Dupuis et al. [5] for Europeans. Recently, Chauhan et al. [15] found associations of eight SNPs with type 2 diabetes and also associations of PPARG and TCF7L2 with HOMA-β. As reported elsewhere, TCF7L2 is also the most promising gene harbouring common variations associated with type 2 diabetes in India because of the replication of results in all the studies [1012]. However, evidence for an association with fasting glucose was positive only in the study by Chandak et al. [10], being negative in most Indian studies [11, 14, 15]. As our study has a large sample size and a design resistant to population stratification, it could therefore be seen as providing strong evidence in support of the role of TCF7L2 in the homeostasis of fasting glucose levels in India.

Despite having reasonable power, we could replicate associations of only seven SNPs with various glycaemic phenotypes. Though difficult to explain, it is possible that the remaining 24 SNPs might not play any role in influencing these continuous traits in Indians. An alternative, perhaps more likely, explanation is that the underlying causal variant is less well tagged by these SNPs in the Indian population owing to different allele frequencies and the extent of LD compared with Europeans.

As observed in the meta-analysis of GWASs by Zeggini et al. [23], we also demonstrated a strong effect (OR 1.5) of a THADA variant in predisposition to type 2 diabetes in our study population. This locus was previously studied among an Asian Sikh population, but no significant results were observed [14]. This variant of THADA (rs7578597) results in a missense mutation (T1187A) [23, 44] and is also known to lower measures of beta cell function [34], and hence it is a likely causal variant. The association of THADA with type 2 diabetes remained significant even after adjustment for BMI, suggesting an independent effect of the variant in predisposition to type 2 diabetes (ESM Table 2). In addition to low power for type 2 diabetes as a binary trait in our study, the lack of association of other SNPs may reflect their low effect sizes in this population as our sample estimates were 1.1 or lower. For instance, the loci FTO, NGN3 (also known as NEUROG3), TSPAN8 (p value > 0.05) could present false negative values because of the sample sizes available (ESM Table 2).

In addition, while comparing the effect sizes for type 2 diabetes and its related quantitative traits with Europeans (see ESM Tables 1 and 3), we did not find any evidence that the risk alleles from our Indian population were acting in the same direction as in the European population (see ESM Table 7). This suggests that further associations would have been present in our sample but failed to reach conventional levels of significance.

The major strengths of our study are the research design and the high number of SNPs associated with type 2 diabetes examined. In the Indian context, this is probably the largest study investigating the effect of 31 genetic variants on various intermediate traits predictive of type 2 diabetes. As the participants came from different parts of India, we can claim good representation of the Indian population and at the same time have been able to control for population stratification because of the model of sib-pair analysis. The present study is the first to report the associations of ADAM30, NOTCH2, CDKN2A/B and TCF-2 with various quantitative glycaemic traits in an Indian population. The associations of CDKAL and CXCR4 with fasting glucose in India are also observed for the first time in our study. We replicated the strong effects of THADA in predisposition of type 2 diabetes in this Indian population. Furthermore, the high level of population heterogeneity as described by Reich et al. [16], population stratification and small sample sizes have plagued case–control-based genetic association studies by providing spurious associations and inconsistent results of replication studies. In this challenging situation, the present study has shown the benefit of a family-based design for conducting genetic epidemiological studies in India. The major limitation of our study was that, because of the cohort design, we had limited power for analysing binary traits and hence could not replicate published results for the majority of genes associated with type 2 diabetes. Although correction for multiple testing is not required for variants that already have strong priors of association with type 2 diabetes, the possibility exists that some of our weaker associations may represent false-positive findings.

In conclusion, using a large sample of sib pairs, we validated the association of seven type-2-diabetes-related loci with various continuous glycaemic traits in the Indian population. We also validated the strong association of THADA with type 2 diabetes in our study population. Furthermore, in the presence of a high level of population heterogeneity and substructure, we have also provided evidence of the utility of the sib-pair design for conducting genetic epidemiological studies for late-onset complex diseases in the Indian population.