Association of NOD1, CXCL16, STAT6 and TLR4 gene polymorphisms with Malaysian patients with Crohn’s disease

Crohn’s disease (CD) is a prominent type of inflammatory bowel disease (IBD) that can affect any part of the gastrointestinal tract. CD is known to have higher prevalence in the Western countries, but the number of cases has been increasing in the past decades in Asia, including Malaysia. Therefore, there is a need to investigate the underlining causes of CD that may shed light on its prevention and treatment. In this study, genetic polymorphisms in NOD1 (rs2075820), CXCL16 (rs2277680), STAT6 (rs324015) and TLR4 (rs4986791) genes were examined in a total of 335 individuals (85 CD patients and 250 healthy controls) with PCR-RFLP approach. There was no significant association observed between NOD1 rs2075820 and STAT6 rs324015 with the onset of CD in the studied cohort. However, the G allele of CXCL16 rs2277680 was found to have a weak association with CD patients (P = 0.0482; OR = 1.4310). The TLR4 rs4986791 was also significantly associated to CD. Both the homozygous C genotype (P = 0.0029; OR = 0.3611) and C allele (P = 0.0069; OR = 0.4369) were observed to confer protection against CD. On the other hand, the heterozygous C/T genotype was a risk genotype (P = 0.0015; OR = 3.1392). Further ethnic-stratified analysis showed that the significant associations in CXCL16 rs2277680 and TLR4 rs4986791 were accounted by the Malay cohort. In conclusion, the present study reported two CD-predisposing loci in the Malay CD patients. However, these loci were not associated to the onset of CD in Chinese and Indian patients.


INTRODUCTION
Both Crohn's disease (CD) and ulcerative colitis (UC), the major sub-types of inflammatory bowel disease (IBD), are regarded as important global health issue, especially in the Western countries (Hilmi, Tan & Goh, 2006). Despite the similarities of some of the clinical symptoms shared by CD and UC, they can be differentiated by the affected locations. CD patients often suffer from multiple inflammation along the entire gastrointestinal tract, including mouth and anus. Occasionally, CD also inflicts complications that involve other organs, such as eyes, joints, blood, skin, and endocrine system. On the other hand, UC displays relatively restricted clinical manifestations, which are always localised at colon and rectum. To date, there is yet a cure for both CD and UC. Treatments and medical procedures are given to patients mainly for symptomatic relief and maintaining remission, which may also come with the cost of side effects, such as nausea and skin rashes (Sandborn et al., 2007).
Despite numerous extensive research studies that were conducted to find the etiology of CD in the past decades, the exact cause of CD remains inconclusive. However, it is generally accepted that, like other autoimmune diseases, the onset of CD could be triggered by the interaction of two major factors, i.e., genetic and environment (Baumgart & Carding, 2007). The increased disease concordance rate in monozygotic twins and higher risk of disease development in siblings of affected individuals highlight the importance of genetic predisposition in the disease development (Orholm et al., 2000). A number of genes have been identified through large scale Genome Wide Association Studies (GWAS) and metaanalysis to have significant linkage with the development of CD in the past (Duerr et al., 2006;Hampe et al., 2007). However, none of these genes was found as the sole contributor to the disease, suggesting polygenic traits in the disease development and progression.
CD are commonly observed in the Western countries, e.g. northern Europe, United Kingdom, and North America (Hilmi, Tan & Goh, 2006). The prevalence rate of CD has been relatively low in developing countries in Asia. Until recently, several studies had reported a rising trend of CD in Sri Lanka, Hong Kong, Japan, and Singapore (Morita et al., 1995;Lee et al., 2000;Niriella et al., 2010). Scientists postulate that this phenomenon might result from the adoption of modern lifestyles from Western counterparts (Thia et al., 2008). The overall prevalence rates of CD in Malaysia are 26 per 100,000 person (Tan & Goh, 2005;Hilmi, Tan & Goh, 2006). The prevalence rate differs among the three main ethnic groups that make up the multiracial society of Malaysia, i.e., Malay, Chinese, and Indian. CD appears most commonly in Indians, followed by Chinese and Malays, with prevalence rates of 52.6, 26.9, and 9.2 per 100,000 person respectively (Tan & Goh, 2005;Hilmi, Tan & Goh, 2006). Some gene and CD association studies have also been conducted previously (Chua et al., 2011;Chua et al., 2012;Chua et al., 2015); however, the actual genetics architecture of CD in the Malaysian populations still remains unclear.
Nucleotide-binding Oligomerization Domain-containing 1 (NOD1) protein is a member of NOD-like receptor (NLR) family. It is involved in the activation of NF-κB and caspase-9 that are important to cellular apoptosis and immune response, respectively (Inohara et al., 1999). NOD1 recognises the invasion of bacteria by reacting with a unique dipeptide, γ -D-glutamyl-meso-diaminopimelic acid (iE-DAP), that is present in the bacterial peptidoglycan (Chamaillard et al., 2003). The gene that encodes NOD1 protein is located on chromosome 7p14. Genetic variations in this gene have been associated to asthma and IBD McGovern et al., 2005). The NOD1 rs2075820 involves the replacement of glutamic acid (E) by lysine (K) at amino acid position-266. This SNP has been shown to increase the susceptibility of gastric mucosal inflammation following Helicobacter pylori infection, possibly due to altered plasma level of NF-κB (Kara et al., 2010).
Chemokine (C-X-C motif) Ligand 16 (CXCL16 ) protein is produced by dendritic cells and splenic red pulp cells (Wilbanks et al., 2001). It is also expressed in macrophages. It interacts with CXC chemokine receptors 6 (CXCR6) during inflammation and recruits the CXCR6 expressing cells, such as naive CD8 cells, intraepithelial lymphocytes, natural killer T cells, activated CD8 and CD4 T cells, to the affected site (Matloubian et al., 2000). CXCL16 is constantly expressed in human cells in two forms, i.e., membrane-bound and soluble forms. The membrane-bound CXCL16 aids the adhesion of bacteria for phagocytosis, whereas the soluble form induces the migration of activated T-cells (Abel et al., 2004;Gough et al., 2004;Nakase et al., 2012). The gene for human CXCL16 is on chromosome 17p13. Study has found that the serum level of soluble form of CXCL16 was increased more than 10 times in CD patients compared to controls (Lehrke et al., 2008). Previous study has demonstrated the association of CXCL16 Ala181Val (rs2277680) variant with severe disease phenotype in young CD patients (Seiderer et al., 2008).
Signal Transducer and Activator of Transcription 6 (STAT6 ) is a transcription factor in STAT family (Hamlin et al., 1999). Unlike other members in the family, STAT6 is activated by interleukin-4 (IL-4) through tyrosine phosphorylation. The phosphorylated STAT6 dimerizes and translocates to cell nucleus. It binds to specific DNA and activates the transcription of certain IL-4-induced genes, which leads to the initiation of humoral immunity respond by T-helper 2 (Th2) cell. Th2 cells promote the production of IgE and the proliferation of mast cells/eosinophils, leading to inflammation (Romagnani, 1999). The STAT6 gene is located on chromosome 12q13. Studies had shown that the G2964A variation (rs324015) within its 3 untranslated region (UTR) was related to familial CD (Klein et al., 2005).
Toll-Like Receptor 4 (TLR4) gene is located on chromosome 9q33. TLR4 protein is highly conserved and functions as a mediator for production of cytokines in innate immune system. It also plays a vital role in pathogen recognition. TLR4 senses the lipopolysaccharides (LPS) from Gram-negative bacteria. Binding of the bacterial LPS to TLR4, together with other accessory proteins such as LPS-binding protein, MD2, and CD14, activates the production of NF-κ B and other cytokines (Philpott & Girardin, 2004;Lavelle et al., 2010). Alteration of the protein structure, due to mutations or SNP, results in differential responsiveness to LPS and may induce massive inflammatory reaction. TLR4 T399I (rs4986791) was related to weakened LPS response and other diseases, i.e., diabetes, rheumatoid arthritis, Alzheimer's disease, renal and cardiovascular disease (Arbour et al., 2000;O'Neill, Bryant & Doyle, 2009). It was also found to associate with the development of CD and UC (Torok et al., 2004;Zouiten-Mekki et al., 2009).
In the present study, we aim to investigate the distribution of four SNPs in NOD1 (rs2075820), CXCL16 (rs2277680), STAT6 (rs324015) and TLR4 (rs4986791) genes in Malaysian patients with CD. We reported the association of alleles and genotypes of these SNPs with the onset of CD in the Malaysian population.

Subject recruitment
The studied samples consisted of 335 individuals from the three major ethnic groups in Malaysia, i.e., Malay, Chinese, and Indian, living around Kuala Lumpur city. A total of

SNP genotyping
Blood sample was collected from each participant in an EDTA tube. Genomic DNA was extracted from all samples via a conventional phenol-chloroform extraction method (Chua et al., 2009). Post-extraction quality check was carried out with spectrophotometry to ensure high-quality DNA for the subsequent genotyping experiments. SNP genotyping was carried out via polymerase chain reaction-restricted fragment length polymorphism (PCR-RFLP) method. The amplification of DNA regions covering the SNPs of interest was conducted with a standard PCR in a thermal cycler (Veriti; Life Technologies, Carlsbad, CA, USA). The final reaction mixture contained 1X Taq buffer, 0.375 µM of respective forward and reverse primers, 0.075 mM dNTP mix, 0.75 unit of Taq DNA polymerase, 100 ng of template DNA, and topped up to a final volume of 20 µl with sterile distilled water. A universal PCR cycling parameter was used: initial denaturation at 95 • C for 12 min, followed by 35 cycles of 94 • C for 30 s, 60 • C (TLR4 rs4986791), 64 • C (NOD1 rs2075820 and CXCL16 rs2277680), or 68 • C (STAT6 rs324015) for 30 s, and 72 • C for 30 s, final extension at 72 • C for 7 min. The amplified products were subjected to digestion with 5 unit of respective restriction enzymes at 37 • C for overnight. Primer sequences, amplicon sizes, and restriction enzymes used for SNP genotyping are summarised in Table 2. The digested products were applied on a 2.5% (w/v) agarose gel stained with ethidium bromide and visualised using a UV transilluminator (Figs. S1-S4). Allelic and genotypic distribution of the SNPs were done by direct counting.  Torok et al., 2004 Validation of the genetic data generated from the PCR-RFLP method was conducted via a PCR-resequencing approach. Representative samples of each observed genotype for the four studied SNPs were selected and amplified in standard PCR reactions using SNP-specific primers. The amplified products were purified and subjected to DNA sequencing reaction. The genotype of each selected sample was determined based on the signal patterns at the particular SNP location in the electropherograms.

Statistical analysis
The allelic and genotypic frequencies of these polymorphisms were calculated. Deviation from Hardy-Weinberg equilibrium was assessed via Arlequin V3.11 software (Excoffier & Lischer, 2010). The power of statistics was calculated with Quanto V1.2 (USC Biostats, Los Angeles, CA, USA), based on overall disease risk, allelic and genotypic frequencies.
Fisher's exact test, odds ratio (OR), and 95% confidence interval (CI) were analysed to correlate the polymorphisms to the onset of CD in Malaysian population. The significant level was set at P < 0.05. Correction of P values for multiple comparison was also calculated based on Bonferroni and False Discovery Rate (FDR) (Dunn, 1961;Benjamini & Hochberg, 1995).

NOD1 rs2075820 G/A polymorphism
No significant association was observed for rs2075820 G/A polymorphism in NOD1 gene in the present study based on the allelic and genotypic data (Table 3). Both G and A alleles were evenly distributed in CD patients and controls. Although the homozygous A genotype was found two times higher in the controls (12.0%) compared to patients (5.9%), the distribution was not significant (P = 0.1498). Further statistical analysis was performed based on stratification of samples to the three main ethnic groups, i.e., Chinese, Malay, and Indian (Table 4). Despite the observation of over three times higher frequency of the homozygous A genotype in the Indian CD patients, no significant association was observed.

CXCL16 rs2277680 A/G polymorphism
As shown in Table 5, the G allele of CXCL16 rs2277680 was observed to have a weak association to the CD patients (P = 0.0482;OR = 1.4310). However, the significance was  dismissed after the correction by Bonferroni and FDR. There was strong significant correlation observed for the genotypes, except the homozygous G genotype, which has a borderline P value of 0.0713. These observations suggest that the G allele may serve as a predisposing factor to CD in the Malaysian cohort. Our postulation was supported by ethnic-stratified analysis (Table 6). Significant associations were found in the Malay cohort for both G allele (P = 0.0306;OR = 2.2227) and homozygous G genotype (P = 0.0154;OR = 4.4423). However, the statistical power computed by Quanto software was only 0.5931 for the association. In addition, the corrected P values for both G allele and homozygous G genotype were not significant (P > 0.05). On the other hand, no significant association was observed in Chinese and Indian CD patients.

STAT6 rs324015 A/G polymorphism
Similar to NOD1 rs2075820, there was no significant association observed in CD patients for both allelic and genotypic distribution (Tables 7 and 8). The A and G alleles were found almost equally in patients and control group. Although the homozygous A genotype present in higher percentage in the control individuals, and it was over two times higher in the Malay healthy controls than patients, no significant relationship was seen in Chinese, Malays, and Indian.

TLR4 rs4986791 C/T polymorphism
There were strong associations observed for the TLR4 rs4986791 with the Malaysian CD patients (Table 9). The C allele (P = 0.0069; OR = 0.4369) and homozygous C genotype (P = 0.0029; OR = 0.3611) were found to confer protective effect against the onset of CD. On the other hand, the heterozygous C/T genotype present with higher percentage in CD patients (22.4%) than controls (8.4%) and significant association was observed (P = 0.0015; OR = 3.1392). These significant associations were supported by high statistical power as calculated by Quanto based on the sample size, i.e., 0.9691 and 0.9993 for homozygous C genotype and heterozygous C/T genotype respectively. Further stratification analysis showed that these significance were contributed by the Malay ethnic group, but not Chinese and Indian (Table 10). The C allele (P = 0.0003; OR = 0.0646) and homozygous C genotype (P = 0.0002; OR = 0.0516; Quanto = 0.8807) were found to be protective in the Malays. In contrast, the heterozygous C/T genotype was significantly associated to Malay CD patients (P = 0.0002; OR = 19.3846; Quanto = 0.8897). Interestingly, it appeared  to be monomorphic in the Chinese cohort, where all individuals were homozygous C genotype. The homozygous T genotype was only seen in Indian, with less than 5%. All the significant associations observed in TLR4 rs4986791 polymorphisms remain significant after correction by Bonferroni and FDR.

Validation study and Hardy-Weinberg equilibrium
The accuracy of the present PCR-RFLP study was validated by PCR-resequencing approach. Fifteen representative samples from each genotype (except for the only five homozygous T genotype for TLR4 gene) was selected and amplified using specific primer sets in a   conventional PCR. The amplicons were purified and subjected to bi-directional sequencing reactions. The genotype of all examined samples were determined based on the fluorescence signal on electropherograms (Figs. S5-S8). There was no discrepancy observed between the genotype determined by PCR-RFLP and PCR-resequencing methods. Hardy-Weinberg equilibrium was also computed for control groups of all three ethnic groups (Chinese, Malay, and Indian) and no deviation from the equilibrium was reported.

DISCUSSION
The investigation of NOD1 rs2075820 in Malaysian CD patients showed that there was no significant association between the SNP with the onset of CD. Our result is in contrast to a genetic study conducted on Hungarian population, where the A allele was significantly observed in CD patients . The rs2075820 presents in an evolutionary conserved region that consists of 171 amino acids. It interacts with a number of other domains such as CARD, WD40 repeat and LRR. It functions to hydrolyze ATP or GTP (Koonin & Aravind, 2000). The A allele was suggested to decrease the helix-formatting potential and affect the binding affinity of ATP or GTP to the domain . NOD1 polymorphisms were also often linked to inflammatory reaction following Helicobacter pylori infection, such as duodenal ulcer and gastritis Kara et al., 2010). Therefore, NOD1 polymorphisms may not be related directly to the onset of CD, but through a complicated mechanism that is triggered by infection. The CXCL16 rs2277680 was significantly associated to CD patients in the Malaysian cohort. Overall, the G allele was found to have a weak predisposing effect on CD. When the data was analyzed according to ethnic group, significant association was only observed in the Malays. The G allele and homozygous G genotype carriers exerted two-and four-fold higher risk to develop CD in the Malay cohort. The statistical power of the sample size and corrected P values however, did not support the association. Therefore, more Malay CD patients could be recruited in future study to confirm this observation. There was no significant association established between the G allele with Chinese and Indian CD patients. This could be due to the difference in genetic background of these ethnic groups. Increased serum level of CXCL16 has been reported in CD patients, suggesting a pro-inflammatory effect of CXCL16 through the stimulation of C-reactive protein (Lehrke et al., 2008). The CXCL16 rs2277680 involves the substitution of Alanine by Valine, which may enhance the efficiency of CXCL16 protein and lead to the development of CD. The SNP was also reported to have no association to the susceptibility of CD in Caucasian population (Seiderer et al., 2008). However, it was shown to influence the clinical development of the disease, such as early age of onset.
The STAT6 gene is situated well within IBD2 susceptibility locus on chromosome 12. It is also one of the components in the pathway that triggers T-cell differentiation in inflammatory responses. Therefore, it may be a potential predisposing gene to the development of CD. In the present study, we did not observe significant association of rs324015 polymorphism, located in the 3 UTR region of STAT6, with the establishment of CD in Malaysian cohort. Similar findings were also reported in Caucasian patients with CD from the Netherlands (De Jong et al., 2003;Xia et al., 2003). Interestingly, the G allele and homozygous G genotype were found significantly increased in CD patients without any variation in CARD15 gene (Klein et al., 2005). The relationship of STAT6 rs324015 with Asthma was also studied extensively but thus far, no significant association has been reported (Li et al., 2013;Qian et al., 2014). Based on these genetic studies, it is plausible to suggest that STAT6 may not have a leading role in the development of inflammatory diseases.
In the overall Malaysian cohort, the TLR4 rs4986791 polymorphism was shown to have very strong association to CD. The homozygous C genotype confers protection against the disease, whereas the heterozygous C/T carriers were seen to have three-fold higher chance to contract CD. Stratification analysis revealed that these associations were actually attributed to the Malay ethnic group, which showed more stringent levels of significance (P < 0.0003). Computation via Quanto software indicates that the sample size is sufficient to express high statistical power for the association analysis. In addition, the corrected P values for all associations remain significant. These further strengthen the association of the TLR4 polymorphisms with the onset of CD in the Malay population. However, there was no significant association observed in the Chinese and Indian cohorts. It is interesting to note that the distribution of TLR4 rs4986791 varied greatly in our study. It was monomorphic in the Chinese, where only homozygous C genotype was detected. Only two genotypes, homozygous C and heterozygous C/T, were seen in the Malays, and all three genotypes were observed in Indians. This may due to the discrepancy in genetic background of different ethnic groups and provide a likely explanation to many controversial case-control studies of TLR4 rs4986791 on diverse populations. Many genetic studies could not establish the link between TLR4 rs4986791 polymorphism and IBD patients from various Asian and Western countries (Browning et al., 2007). Several meta-analysis studies however, concluded that the T allele increases the risk of CD and/or UC, especially in Caucasian populations (Shen et al., 2010;Senhaji et al., 2014;Cheng et al., 2015). The T allele was suggested to impact the TLR4 protein by affecting the transcription and expression of the gene (Cheng et al., 2015).
In conclusion, we reported significant associations of two loci (CXCL16 rs2277680 and TLR4 rs4986791) with Malay CD patients in Malaysia. However, there was no significant association observed for Chinese and Indian patients. A limitation of the current study was the small sample size of the CD patient cohort, particularly for the Malay ethnic group. Another limitation was the unknowing sub-ethnicity stratification within each of the three studied ethnic groups that may lead to ambiguous associations.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
This study was supported by High Impact Research MoE Grant UM.C/625/1/HIR/MoE/ E000044-20001, University of Malaya Research Grant (RG274/10HTM) and Universiti Malaya Research Fund Assistance (BK027/2015). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.