Abstract

Human leukocyte antigen (HLA) system is the most polymorphic and gene dense region of human DNA that has shown many disease associations. It has been further divided into HLA classes I, II, and III. Polymorphism in HLA class II genes has been reported to play an important role in the pathogenesis of type 1 diabetes (T1D). It also showed association with T2D in different ethnic populations. However, a little is known about the relationship of HLA class I gene polymorphism and T2D. This study has evaluated the association of HLA-B (class I gene) variants with T2D in Pashtun ethnic population of Khyber Pakhtunkhwa. In the first phase of the study, whole exome sequencing (WES) of 2 pooled DNA samples was carried out, and DNA pools used were constructed from 100 diabetic cases and 100 control subjects. WES results identified a total of SNPs in HLA-B gene. In the next phase, first 5 out of reported SNPs were genotyped using MassARRAY® system in order to validate WES results and to confirm association of selected SNPs with T2D. Minor allele frequencies (MAFs) and selected SNPs×T2D association were determined using chi-square test and logistic regression analysis. The frequency of minor C allele was significantly higher in the T2D group as compared to control group (45.0% vs. 13.0%) () for rs2308655 in HLA-B gene. No significant difference in MAF distribution between cases and controls was observed for rs1051488, rs1131500, rs1050341, and rs1131285 (). Binary logistic regression analyses showed significant results for SNP rs2308655 (, , and ), while no considerable association was observed for the other 4 SNPs. However, when adjusted for these variants, the association of rs2308655 further strengthened significantly (, , and ), except for rs1131500, which has no additive effect. In conclusion, the finding of this study suggests rs2308655 variant in HLA-B gene as risk variant for T2D susceptibility in Pashtun population.

1. Introduction

Type 2 diabetes (T2D) is a multifactorial metabolic disease characterised by impaired glucose haemostasis that is primarily caused by lack of response of peripheral tissues to insulin and/or insufficient production/secretion of insulin by β cells of pancreas. Environmental and genetic variations are key risk factors for T2D [1, 2]. According to the recent report (Diabetes Atlas edition 2019) of International Diabetes Federation (IDF), approximately 415 million people around the world have diabetes, with 90% of these individuals having T2D [3]. Currently, the epicentres of diabetes mellitus (DM) prevalence have been in China, India, United States of America (USA), Pakistan, and Brazil [4]. Its prevalence rate is alarmingly high in developing countries, with more than 80% cases being reported in these nations [5]. It is projected that by 2045, the number of cases in Pakistan will exceed that of in the USA, moving the former from 4th position to 3rd in diabetes prevalence race [3]. Currently in Pakistan, 19.4 million people are living with diabetes [3]; the number was 5.5 million in 2000 [6].

Type 2 diabetes (T2D) risk is strongly heritable [79]. Genetic studies offer a powerful approach for better screening and treatment of diseases by identifying alterations at molecular level associated with physiological trait [10, 11]. Till date, a rich landscape of information about pathogenesis of T2D has been provided [12, 13]. Recent genome-wide association studies (GWAS) in different ethnic population around the world have identified hundreds of T2D susceptible genomic variants, although translating these findings into clinical practice is still challenging [7, 14]. Noteworthy, CDKAL1, HLA-B, TCF7L2, SLC30A8, HHEX, IGF2BP2, CDKN2A/B, EXT2, and FTO genes were found to be associated with T2D in different ethnic populations around the world [1521]. Very few such studies on T2D are available in Pakistani cohort. Genes like HHEX/IDE, KCNJ11, NOTCH2, WFS1, IRS1, CAPN10, KCNQ1, HNF4A, TCF-2/HNF1B, IRS-2, and TCF7L2 have been studied to be associated with T2D in Pakistani population [2226].

Pakistani population includes 5 major ethnic groups, namely, Punjabis, Pashtuns, Sindhis, Baluchis, and Muhajirs. Pashtun constitute the major population of Khyber Pakhtunkhwa (KP). They have unique cultural practices and social values, life style, and behaviours that make them a suitable population for such studies. It is hypothesized that genetic mutation spectrum of type 2 diabetes in Pakistani population is different from elsewhere [23, 27]. This study has found deleterious mutations in human leukocyte antigen-B (HLA-B) gene associated with type 2 diabetes in Pashtun ethnic population of Khyber Pakhtunkhwa, Pakistan, using high-throughput sequencing.

The study will help to better understand the pathogenesis of T2D in the study population and to devise modification strategies to overcome/control the burden of this fatal and costly disease.

2. Materials and Methods

2.1. Participants

A total of 200 individuals (diabetic and nondiabetic ) of Pashtun ethnicity belonging from different districts such as Peshawar, Charsadda, Mardan, Bannu, Kohat, Dir, and Swat of Khyber Pakhtunkhwa were included in the study. Patients were registered at Lady Reading Hospital (LRH) Peshawar, Hayatabad Medical Complex (HMC) Peshawar, and Khyber Teaching Hospital (KTH) Peshawar while control samples were collected from specially organized diabetes free medical and screening camps at Rehman Medical Institute (RMI) Hayatabad Peshawar and Diabetic Association Charsadda (DAC). The study period was from July 2018 to July 2019. Inclusion criteria for cases were (i) diabetes diagnosed as per International Diabetes Federation (IDF) etiologic classification, (ii) confirmation that subjects are of Pakistani origin and Pashtun ethnic, and (iii) age above 30 years. Exclusion criteria were (i) mentally ill patients, (ii) age below 30 years, and (iii) diabetes during pregnancy, recent infections, and presence of malignancies. Patient’s inclusion and exclusion were according to the previously defined criteria used for Asian populations [28]. Control subjects were healthy individuals from general population with blood sugar level in normal range (<99 mg/dL fasting or 120 mg/dL) [29]. Consent form and thorough demographic, family, and clinical history of all the participants were taken on a carefully designed Proforma. For illiterate participants, who did not understand English, consent form for their understanding was read and explained in local Pashtu language and then signed on his behalf by any of his/her relative/attendant. The study protocols were according to the guidelines of Helsinki declaration (1975), and ethical approval was obtained from the Ethical Committee of the Department of Pharmacy, University of Peshawar. For overall study design, see Figure 1.

2.2. Blood Sampling

Three-millilitre whole blood was collected following aseptic procedures from the median cubital vein of study individuals in EDTA tubes (properly labelled) and was stored at -10°C.

2.3. DNA Extraction and Pool Preparation

DNA was extracted from 200 μl whole blood samples of T2D patients using WizPrep DNA extraction kit (WizPrep no. W54100). DNA quantification was carried with the help of Qubit™ dsDNA HS Assay kit (Catalog No. Q32851) using Introgen Qubit™ 3, and concentration was adjusted to 10 ng/μL.

2.4. Whole Exome Sequencing

Whole exome sequencing (WES) was carried out at the Centre of Genomics, Rehman Medical Institute (RMI), Hayatabad, Peshawar. In order to simplify sequencing process and to reduce cost and time, DNA pools were constructed from 100 diabetic cases and 100 control subjects according to the previously described DNA-pooling protocols [30, 31]. Each pool contains an equimolar amount of DNA (100 ng) from each individual. DNA pools were then subjected to amplification and sequencing via HiSeq2500 platform (Illumina, San Diego, CA, USA) using paired-end libraries ( bp).

2.5. WES Analysis

A custom-built in-house NGS bioinformatics pipeline was employed in order to move raw sequencing data to final variant calls. Raw FASTQ files produced by the Illumina HiSeq were filtered to separate low quality reads () using CASAVA and trimmomatic tool [32, 33]. The filtered reads were then aligned to the reference genome (hg19/GRCh37) using BWA-mem (v 0.7.13) [34, 35]. Base recalibration was carried using GATK (v 3.2.2). Variants were called using GATK Unified Genotyper; the called variants were stored as VCF file. The variants were annotated from variant calling file using ANNOVER [36]. The resulting annotated variant file was loaded in Excel program file for easy understanding, filtering, and analysis of data.

2.6. WES Results

Exome sequencing identified a total of 1048575 SNPs including 607572 homozygous SNPs, 441003 heterozygous SNPs, 99392 deletion, 74390 insertion, 50280 exonic SNPs, 7710 missense variants, 1797 variants expressed in pancreas, and 570 possibly pathogenic mutations. Detailed WES results are shown in Figure 2.

2.7. Filtration of WES Data

In search of potential T2D, associated variant data was filtered. The flow diagram of data filtration is shown in Figure 3. The annotated Excel files were first manually curated to shortlist exonic and missense variants while synonymous variants were discarded. Resultant file was filtered for T2D susceptible genes (supplementary file 1). HLA-B gene was found to be of interest to be further investigated, among several others. Reported HLA-B variants () are shown in Table 1.

2.8. Validation Trail and Genotyping of HL-B Variants

A total of SNPs in HLA-B gene were identified using whole exome sequencing (Table 1). In order to validate whole exome sequencing results and to affirm the association of the newly identified HLA-B risk variants with T2D, SNPs were genotyped. Out of reported SNPs, first 5 SNPs (rs2308655, rs1051488, rs1131500, rs1050341, and rs1131285) were selected for further study. Genotyping of the selected candidate SNPs was carried at the Centre of Genomics, Rehman Medical Institute (RMI), Hayatabad, Peshawar, using Sequenom MassARRAY® system (Agena Bioscience, San Diego, CA) carefully following the manufacturer’s guidelines.

2.9. Statistical Analysis

Statistical data analysis was performed using IBM SPSS (Statistical Package for Social Sciences version 24). Key variables selected for analysis were age, gender, weight, geographical area, life style, smoking, exercise, occupation, diet, and variants in HLA-B gene. Quantitative variables were checked using independent samples -test. Data for quantitative variables were shown as , while categorical variables of cases and controls were compared using chi-square () test. Data for categorical variables were expressed as number and percentage. All reported SNPs were tested for Hardy-Weinberg equilibrium (HWE). Minor allele frequencies (MAFs) between cases and controls were compared using test. The association of selected individual SNPs×T2D was checked using binary logistic regression and was also adjusted for combinations within themselves to determine their combined effect. Statistically, a was considered significant.

3. Results

3.1. Subject Characteristics

Demographic and general characteristics of study subjects are shown in Tables 2 and 3. The prevalence of comorbidities like hypertension, ischemic heart disease, renal failure, retinopathy, and hypercholesterolemia was higher in cases than those of control subjects as shown in Table 2. No significant difference (-test ) in mean weight of cases and controls ( vs. ) was observed. An average normal blood pressure (120/80 mmHg) was observed in patients with T2D; however, an elevated blood pressure (>120/80 mmHg) was observed in patients with comorbidities like ischemic heart disease, renal failure and hypercholestermia. Reference to the patient’s general characteristics is shown in Table 3. Sixty-five percent (65%) of patients were males, and 35% were females. The highest incidence of T2D was reported in district Peshawar (53%) followed by district Charsadda (13%), district Mardan (12%), district Swat (08%), district Kohat (06%), district Dir (05%), and district Bannu (03%). Patients included in the study were from different occupations, and most of the patients were doing laborious jobs like driving and farming. Occupation wise, 34% female patients were housewives, 20% were labours, 16% were businessmen, 13% were government servants, 10% were drivers, while 07% were farmers. The highest (34%) prevalence was seen in female patients who were housewives. When patients were asked for family history of T2D, 94% answered “Yes” for family of diabetes while 06% answered “No” for family history of diabetes. Moreover, 85% of patients were nonexercising (sedentary life style), 15% were exercise loving, and none of the patient (0%) was attached with any sport or gym. Majority of (60%) patients were nonsmokers, 21% of patients were using Naswar (a local smokeless tobacco product), while 19% were chain cigarette smokers. The fraction of patients who were taking proper diet in order control diabetes and its complication was 50% while the rest (50%) showed no diet compliance habits.

3.2. Minor Allele Frequency (MAF) Analysis

The minor allele frequencies for rs2308655, rs1051488, rs1131500, rs1050341, and rs1131285 were compared between T2D group () and control group () using chi-square test. The frequency of the minor allele C was significantly higher in the T2D group as compared to control group (45.0% vs. 13.0%; ) for rs2308655 in HLA-B gene, while no significant differences in minor allele distribution between cases and control were observed for the other 4 SNPs. For complete detail of MAF comparison between cases and controls, consider Table 4.

3.3. Association between SNPs and T2D (SNPs×T2D)

All 5 HLA-B variants (rs2308655, rs1051488, rs1131500, rs1050341, and rs1131285) were checked for T2D association using logistic regression analysis. The frequency of minor C allele was significantly higher in the T2D group as compared to control group (45.0% vs. 13.0%) () for rs2308655 in HLA-B gene. No significant difference in MAF distribution between cases and controls was observed for rs1051488, rs1131500, rs1050341, and rs1131285 (). Binary logistic regression analyses showed significant association for SNP rs2308655 (, , and ), while no considerable association was observed for the other 4 SNPs. However, when adjusted for these variants, the association of rs2308655 further strengthened significantly (, , and ), except for rs1131500, which has no additive effect. For detailed statistical analyses, see Table 5.

4. Discussion

The human leukocyte antigen (HLA) gene complex encodes the major histocompatibility complex (MHC) in humans and linked with numerous diseases [3740] (supplementary file 2). It has three main classes: class I HLAs (A, B, and C), class II HLAs (DQ, DM, DP, DR, and DO), and class III HLAs (CSK2B, SKI2W, C4B, and PBX2) [41]. Mutations in HLA class II alleles are linked with T2D according to previous studies [4244]. A meta-analysis of 17 genome-wide association studies on T2D performed in African American cohort linked polymorphism in HLA-B (class I gene) with T2D [45]. HLA-B not only increase disease susceptibly but also involve in many adverse drug reaction [4649]. The present study evaluates the association of genetic variations in HLA-B, a highly polymorphic gene with >3000 variants [50] with T2D in Pashtun ethnic population of Khyber Pakhtunkhwa, Pakistan, using whole exome sequencing.

We performed whole exome sequencing of pooled DNA samples and genotyping of selected SNPs using MassARRAY® system to identify T2D risk variants. Using this approach, our first top notable candidate SNP reported was missense, heterozygous variant rs2308655 (c.1046G>C, p.Cys349Ser) located on short arm of chromosome 6. SIFT and PolyPhen (scores 0.03 and 0.991) predicated rs2308655 variant as deleterious and probably damaging. Second missense, heterozygous, exonic variant reported was rs1051488 (c.985G>A, p.Ala329Thr). SIFT and PolyPhen (scores 0.05 and 0.062) predicated rs2308655 variant as tolerated low confidence and of benign nature. A third missense mutation rs1131500 (c.916G>A, p.Val306Ile) reported was tested nondamaging and benign by SIFT and PolyPhen (scores of 1.00 and 0.001). The last two common missense variants reported were rs1050341 (c.652A>G, p.Ile218Val) and rs1131285 (c.610G>C, p.Glu204Gln); both variants were scored nondamaging and benign by SIFT and PolyPhen.

Our study pinpoints factors like genetic mutations, ethnicity, and environmental and demographic differences involved in the incidence of T2D. The MHC is regarded as highly gene dense and hyperpolymorphic region of human genome [41, 51]. Gene’s mutation in MHC region is involved in pathogenesis of diseases like diabetes [52], rheumatoid arthritis [53], cancer [54], multiple sclerosis [55], and psoriasis [56] and also involved in adverse drug reactions [57, 58]. We tested association of genetic polymorphism in HLA-B and T2D susceptibly in Pashtun ethnic population of Khyber Pakhtunkhwa, Pakistan. A large meta-analysis in African American cohort confirms association of T2D and HLA-B polymorphism [45]. Our results suggest strong association of rs2308655 variant in HLA-B gene with T2D. MAF comparison analysis for rs2308655 variant shows that the frequency of minor allele C is much higher in cases than in controls (). Regression analysis of rs2308655 also showed significant results (, , and ). In contrast, the other 4 SNPs (rs1051488, rs1131500, rs1050341, and rs1131285) were salient showing no association with T2D. Binary regression analysis results for rs1051488, rs1131500, rs1050341, and rs113128 in HLA-B region were found nonsignificant. However, these genes potentiate the effect of rs2308655, as odd ratios increased significantly when these variants were adjusted with it (, , and ) (for details, consider Tables 3 and 4).

The exact underlying mechanism of HLA-B polymorphism and incidence of T2D is not yet known, and comprehensive studies on large scale are needed to explore the mechanism rs2308655 variant in T2D.

Furthermore, sociodemographic analysis of cases and controls showed an increase incidence T2D in male compared to female patients. The occurrence of T2D was found high in old age patients compared to younger age patients. Comorbidities like hypertension, ischemic heart disease, renal failure, and hypercholestermia occurrence percentage were found much higher in T2D patients than controls. Most of the patients (94%) were found to have family history of diabetes. Diet control and physical activities (exercise) were recorded poor in patients. Similarly, our data shows that almost 100% patients were belonging from low-income families and doing laborious low-paid jobs.

4.1. Study Limitations

Study limitations include limited sample size, and sample collection was restricted only to Khyber Pakhtunkhwa province; our study does not include some intersecting genes that may have strong association with T2D and high percentage of comorbidities in cases. Present study despite of these limitations successfully demonstrated the role of common genetic variants in progression of complex diseases like diabetes in Pakistani cohort. Further, more advance tools of analysis have been used instead of conventional ones.

5. Conclusion

The study demonstrated the association of HLA-B gene with T2D in the study population. The findings suggest strong association of SNP rs2308655 with T2D in Pashtun ethnic population. The other HLA alleles, namely, rs1051488, rs1131500, rs1050341, and rs1131285, were shown to have no/very weak (if any) association with T2D. To our knowledge, this study is first of its kind to report T2D risk variants in patients of Khyber Pakhtunkhwa, Pakistan. To overcome this fatal and costly disease, it is recommended that similar projects should be designed on large scale to screen individuals who are genetically susceptible to T2D and awareness campaigns on genetic and environmental risk factors should be initiated in general public. This will help reduce/control the prevalence of the disease.

Data Availability

All the needed and necessary information has been provided along with the manuscript. However, the corresponding author can be contacted for any other information related to this paper.

Conflicts of Interest

Authors declare no competing interests.

Acknowledgments

Funding for this project was provided by the Government of Khyber Pakhtunkhwa, under Higher Education Research Endowment Fund, No. PMU1-22/HEREF/2014-15/Vol-IV/3408. The authors would like to thank all the control subjects (volunteers) and diabetic patients for agreeing to participate in this study. We are grateful to Dr. Muhammad Hussain Afridi, consultant endocrinologist of Hayatabad Teaching Hospital Peshawar, Dr. Shaukat Hayat, endocrinologist of AIMS Diabetes Hospital & Research Centre, Dr. Sobia Afridi, consultant endocrinologist of Lady Reading Hospital Peshawar, Dr. Johar Ali, head of Centre of Genomics, Rehman Medical Institute, Hayatabad, Peshawar, and Dr. Syed Jamal Akbar, consultant endocrinologist of Diabetes Association Charsadda for their kind collaboration in patient demographic and blood sample collection. We also thank Syed Adnan Ali Haider for his kind efforts in bioinformatics analysis of data at the Centre of Genomics, Rehman Medical Institute.

Supplementary Materials

Resultant file was filtered for T2D susceptible genes (supplementary file 1). It has three main classes: class I HLAs (A, B, and C), class II HLAs (DQ, DM, DP, DR, and DO), and class III HLAs (CSK2B, SKI2W, C4B, and PBX2) (supplementary file 2). (Supplementary Materials)