Genetic variants in RET, ARHGEF3 and CTNNAL1, and relevant interaction networks, contribute to the risk of Hirschsprung disease

Hirschsprung disease (HSCR), the most common enteric neuropathy, stands as a model for complex genetic disorders. It has recently been demonstrated that both ARHGEF3 and CTNNAL1 map to the RET-dependent HSCR susceptibility loci. We therefore sought to explore whether genetic variants within RET, ARHGEF3 and CTNNAL1, and their genetic interaction networks are associated with HSCR. Taking advantage of a strategy that combined the MassArray system and gene-gene interaction analysis with case-control study, we interrogated 38 polymorphisms within RET, ARHGEF3 and CTNNAL1 in 1015 subjects (502 HSCR cases and 513 controls) of Han Chinese origin. There were statistically significant associations between 20 genetic variants in these three genes and HSCR. Haplotype analysis also revealed some significant global P values, i.e. RET_ rs2435357-rs752978-rs74400468-rs2435353-rs2075913-rs17028-rs2435355 (P = 3.79×10-58). Using the MDR and GeneMANIA platforms, we found strong genetic interactions among RET, ARHGEF3, and CTNNAL1 and our previously studied GAL, GAP43, NRSN1, PTCH1, GABRG2 and RELN genes. These results offer the first indication that genetic markers of RET, ARHGEF3 and CTNNAL1 and relevant genetic interaction networks confer the altered risk to HSCR in the Han Chinese population.


INTRODUCTION
Hirschsprung disease (HSCR, MIM 142623), as a lifethreatening genetic disorder, affects approximately 1/5000 live births around the world (2.8/10000 newborns in Asian population), which makes it the most frequent congenital disease of intestinal motility [1]. HSCR is a functional intestinal obstruction characterized by congenital absence of enteric neurons along a variable length of the bowel, and accordingly, it can be anatomically classified as short segment HSCR (S-HSCR), long segment HSCR (L-HSCR) and total colonic aganglionosis (TCA), and approximately 80% of HSCR cases belong to S-HSCR [2]. Of note, Hirschsprung disease has also been identified in elderly patients, including a 70-year-old male and a 67-year-old female [3,4]. Additionally, gastrointestinal abnormalities were observed in patients with Parkinson's disease (PD) over 30 years ago, indicating that the enteric nervous system (ENS) might be involved in PD, and mouse models of PD pathogenesis also show varying degrees of enteric neuronal dysfunction [5].
The major causes of Hirschsprung disease can be attributed to genetic factors or gene-gene interaction networks, since HSCR presents crucial features of AGING multifactorial genetic models, such as low sexdependent penetrance, high sibling recurrence risk, high heritability and interfamilial variation [6]. Approximately 80% of HSCR cases are sporadic, and the rest are familial; moreover, HSCR is associated with other congenital malformations in approximately 30% of cases [1,7]. Resembling other complex diseases, a single homozygous null mutation of HSCR susceptibility genes is not sufficient to lead to a serious aganglionosis phenotype in HSCR [8]. To date, genetic studies have revealed that at least 15 genes might be involved in HSCR development, yet these genes account for only ~ 30% of all HSCR cases [9], implying the involvement of more genes in HSCR susceptibility. Whole exome sequencing study has revealed some novel HSCR genes, including DENND3, NCLN,NUP98 and TBATA, all of which might play a role in neuronal development [10]. More recently, new loci have been uncovered to be associated with Hirschsprung disease, such as ALDH1A2, PLD1, CASQ2 and CCT2 [11][12][13].
RET (ret proto-oncogene, located on chromosome 10q11.2), the major gene in HSCR, accounts for more than 80% of all known mutations [10,14]. A recent genome-wide association study has revealed the key role of non-coding RET variants in HSCR [15]. Moreover, it has been indicated that in most HSCR cases, individuals with non-coding RET mutations also carry modifier loci, which contribute to HSCR presentation and phenotype severity, suggesting the potential interactions between RET and other genes involved in HSCR pathogenesis [16]. Taking advantage of the microarray technique and mouse model, Heanue et al. have demonstrated that both Arhgef3 (Rho guanine nucleotide exchange factor (GEF) 3) and Ctnnal1 (catenin (cadherin associated protein), alphalike 1) have human homologues that map to previously identified HSCR susceptibility loci, i.e. the RETdependent susceptibility loci (3p21 and 9q31), which makes them excellent candidate genes for Hirschsprung disease [17][18][19]. In addition, we wondered whether the interactions between RET, ARHGEF3 and CTNNAL1 contribute to the altered risk of HSCR since the joint gene-gene effects may have a significant impact on HSCR development [20]. It has recently been demonstrated that the interaction between RET and PHOX2B polymorphisms substantially affects the risk of Hirschsprung disease, suggesting that HSCR, as a multifactorial genetic disorder, requires the interactions of multiple unlinked genes to produce the phenotype [20]. In a recent study, we have shown that genetic markers within GAL, GAP43 and NRSN1 contribute to the altered HSCR susceptibility, and importantly, the interaction networks among GAP43, NRSN1 and PTCH1 confer an increased risk to Hirschsprung disease [21].
With all these lines of evidence, we tried to determine whether genetic variants of RET, ARHGEF3 and CTNNAL1 have an impact on the risk of Hirschsprung disease, and by recruiting 1015 subjects and 38 polymorphisms within these three genes, we further explored the potential interaction networks among RET, ARHGEF3, and CTNNAL1 and our previously studied GAL, GAP43, NRSN1, PTCH1, GABRG2 and RELN genes.

Allele and genotype distributions of the genetic variants in RET, ARHGEF3, and CTNNAL1
In the 1015 subjects, genotype distributions of all 38 polymorphisms showed no significant deviations from Hardy-Weinberg equilibrium in either HSCR cases or normal controls (P> 0.05). The allele and genotype frequencies of the 38 SNPs are listed in Tables 1-3. We found that there were significant associations between HSCR and 20 genetic variants, including 13 RET SNPs, 3 ARHGEF3 SNPs, and 4 CTNNAL1 SNPs. Of note, all 13 positive SNPs in RET survived the FDR correction in terms of both allele and genotype distributions (Table 1), and the findings observed in genotype distributions of ARHGEF3 rs3732508 and in allele distributions of the 3 ARHGEF3 SNPs and the 4 CTNNAL1 SNPs remained significant after the FDR correction (Tables 2, 3). In addition, compared with previous findings regarding the 4 RET variants (rs2506030, rs7069590, rs2505998 and rs2435357), our present results have shown that the odds ratios have the same magnitudes, suggesting that the cases studied here are representative of Hirschsprung disease (Table 4). Regarding the 20 positive markers, we observed that certain alleles and genotypes presented markedly higher frequencies in the HSCR group than in the control group, such as the T allele and TT genotype of RET rs2435357, the A allele and AA genotype of ARHGEF3 rs3732508, and the G allele and GG genotype of CTNNAL1 rs4978379. Moreover, we also employed Plink software to perform the adjustment for age and gender factors, and all 20 positive polymorphisms survived the correction.

Linkage disequilibrium analysis
We further conducted linkage disequilibrium (LD) analyses of genetic markers within the RET, ARHGEF, and CTNNAL1 genes since LD makes tightly linked SNPs markedly correlated, leading to cost savings for association studies [22]. Figure 1 presents LD results for each pair of genetic variants in the HSCR group and control group. Strong LD (D' > 0.7) was found in multiple groups of markers, such as RET_rs2435353-rs2075913-rs17028, ARHGEF3_rs11720618-rs11925835-rs3732508 AGING  *Pearson's p value, FDR = false discovery rate, SNP = single nucleotide polymorphism, CI = confidence interval, HSCR = Hirschsprung disease. *Pearson's p value, FDR = false discovery rate, SNP = single nucleotide polymorphism, CI = confidence interval, HSCR = Hirschsprung disease.

Haplotype analysis and power calculations
Since haplotype constructed from genetic markers with strong LD (D' > 0.7) will markedly increase the statistical power for association with the disease, we then carried out haplotype analyses of SNPs in these three genes. Moreover, haplotypes were omitted from the analysis if the estimated haplotype probabilities were less than 3% in either the HSCR or control group. Only haplotypes with strong LD were selected for presentation (Supplementary Table 1). In the present study, we observed that multiple haplotypes were significantly associated with HSCR. For the 38 genetic variants within these three genes, haplotype analysis also unraveled some significant global P values (Table 5). For each gene, the most significant AGING  [14,26], FDR = false discovery rate, SNP = single nucleotide polymorphism, CI = confidence interval, HSCR = Hirschsprung disease.
We also used GeneMANIA, a flexible user-friendly database, to explore the functional association networks among RET, ARHGEF3, and CTNNAL1 and our previously studied GAL, GAP43, NRSN1, PTCH1, GABRG2 and RELN genes [21]. As shown in Figure 2C, gene function prediction indicated that RET, ARHGEF3, and CTNNAL1 might be involved in the positive regulation of neuron projection development, regulation of cell projection organization, and small GTPasemediated signal transduction. These nine genes interacted with each other mainly through co-expression and/or genetic interactions, and some of them might contribute to the same pathways, such as signaling by GPCR.

DISCUSSION
HSCR is a congenital complex disorder caused by a deficit in the migration process of enteric neural crest cells (ENCCs). To date, more than 15 genes have been linked to the development of Hirschsprung disease, and yet, the landscape of genetic networks regarding HSCR risk has not been fully characterized. Here, we carried out linkage disequilibrium analyses of 38 genetic markers within the RET, ARHGEF, and CTNNAL1 genes in 502 HSCR cases and 513 normal controls and identified significant associations of these three genes with altered HSCR susceptibility. Moreover, our present work for the first time unraveled that the interaction networks among RET, ARHGEF3, and CTNNAL1 and our previously studied GAL, GAP43, NRSN1, PTCH1, GABRG2 and RELN genes might contribute to an increased risk of HSCR.

AGING
Recent genetics studies have mainly focused on the major HSCR gene, RET, which contains at least 80% of all known HSCR-causing mutations [24]. Our present results showed that 13 out of 14 genetic variants within RET were significantly linked to the altered HSCR susceptibility. Of note, certain alleles and/or genotypes of these 13 positive SNPs might be risk factors for HSCR, such as the A allele and AA genotype of rs2505998, the T allele and TT genotype of rs2435357 and the C allele and CC genotype of rs752978, while the others might be protective factors against HSCR, such as the T allele and TT genotype of rs17028 and the C allele and CC genotype of rs2435355. Additionally, the 3 positive markers (rs17028, rs2742240 and rs2435355) may further affect the regulatory mechanisms of gene expression since all of them are located in the 3' UTR region [25]. Regarding the 4 SNPs (rs2506030, rs7069590, rs2505998 and  *The best model was referred to as the one with the maximum testing accuracy and maximum cross-validation consistency (CVC); MDR = multifactor dimensionality reduction, CI = confidence interval, HSCR = Hirschsprung disease.
rs2435357) that have been reported in Hirschsprung disease [14,26], our findings also supported that these 4 markers contribute to the altered HSCR risk.
We then tried to interrogate the relationship between ARHGEF3, CTNNAL1 and HSCR, since both of them were indicated as excellent candidate genes for HSCR [17]. Our current work presented strong associations of ARHGEF3 and CTNNAL1 with Hirschsprung disease.
As the positive variant rs3732508 in ARHGEF3 is synonymous, it may not change the amino acid directly, and yet, based on recent studies, synonymous SNPs can affect mRNA splicing, stability and structure as well as protein folding, which may cause changes in protein function and cellular response to therapeutic targets [27]. In humans, ARHGEF3 belongs to the family of RhoGEFs (Rho guanine nucleotide exchange factor), which promote GDP to GTP exchange [28]. A recent study showed that knockdown of ARHGEF3 markedly inhibits NPC (nasopharyngeal carcinoma) cell growth and migration [29]. On the other hand, rs7021366_CTNNAL1, the positive marker, is a missense mutation, which results in a change in the amino acid and thus may substantially have an impact on the function of proteins. CTNNAL1 (α-Catulin) is a cytoplasmic molecule that integrates the crosstalk between nuclear factor-kappa B and Rho signaling pathways, and attenuation of α-Catulin in vitro blocked cell migration and invasion induced by other proteins [30,31]. Regarding ARHGEF3 and CTNNAL1, our work also revealed some potential risk factors for HSCR, such as the C allele and CC genotype of ARHGEF3 rs11720618, the A allele and AA genotype of ARHGEF3 rs3732508, and the G allele and GG genotype of CTNNAL1 rs7021366, all of which might confer an increased risk of HSCR. Importantly, additional replication studies recruiting larger Asian and non-Asian samples and more markers will certainly be needed in the future.
Interestingly, both ARHGEF3 and CTNNAL1 map to the RET-dependent HSCR susceptibility loci identified at 3p21 and 9q31, respectively [18,19]. Since ARHGEF3 encodes a RhoGEF and CTNNAL1 can interact with RhoGEFs, it has been indicated that the two genes potentially interact in the modulation of cell migration [17,33]. We thus enrolled the multifactor dimensionality reduction (MDR) strategy to explore the gene-gene interactions among RET, ARHGEF3 and CTNNAL1 (Figure 2A, 2B), and interrogated the best interaction model with maximum testing accuracy and maximum cross-validation consistency (CVC) among these three genes. Of note, MDR is a nonparametric model-free method that does not require particular inheritance model AGING to detect gene-gene interactions without main gene effects in case-control studies of complex diseases [34]. In our MDR analysis, the best interaction models also contained certain positive genetic variants associated with HSCR, including RET_rs7069590, ARHGEF3_rs11925835, ARHGEF3_rs3732508 and CTNNAL1_rs7021366 (Table  6). Interestingly, the best four-factor model, RET (rs7069590)-ARHGEF3 (rs11717604)-ARHGEF3 (rs3732508)-CTNNAL1 (rs7021366), showed the most significant OR compared to other models, further supporting that a multifactor model might play a major role in HSCR susceptibility. Since 2 positive cSNPs (ARHGEF3_rs3732508 and CTNNAL1_rs7021366) were also involved in this best four-factor model, the interaction between them might have a functional impact on HSCR pathogenesis.
Taking advantage of the GeneMANIA platform, we then explored functionally related gene-gene interaction networks among RET, ARHGEF3, and CTNNAL1, and our previously studied GAL, GAP43, NRSN1, PTCH1, GABRG2 and RELN genes [21]. These HSCRassociated genes were linked to each other through coexpression, genetic interactions, co-localization, or they might be involved in the same pathways or might be targeted by the same microRNAs ( Figure 2C). Moreover, RET, ARHGEF3 and CTNNAL1 might functionally contribute to the regulation of cell projection organization, small GTPase-mediated signal transduction and positive regulation of neuron projection development, and thus they might be involved in HSCR etiology, as HSCR is due to a deficit in the development of the enteric nervous system.
Taken together, our present data show that genetic variants and haplotypes in RET, ARHGEF3 and CTNNAL1 confer an altered risk to Hirschsprung disease in the Han Chinese population. To the best of our AGING knowledge, this study offers the first indication that the gene-gene interactions among RET, ARHGEF3 and CTNNAL1 contribute to an increased risk of HSCR. Moreover, we have also unraveled the potential interaction networks consisting of 9 HSCR-related genes, which might be involved in HSCR susceptibility. Future research is required to fully understand the complexity of genetic interaction networks involved in HSCR risk and address the genetic basis of Hirschsprung disease.

Study subjects
In the present study, we recruited 1015 subjects, including 502 sporadic HSCR cases (383 males and 119 females, age 1.34 ± 2.12 years) and 513 normal controls (310 males and 203 females, age 2.70 ± 3.13 years). All the participants were of Han Chinese origin and were enrolled from biologically unrelated residents. The 502 cases in the study had the diagnosis of HSCR based on histological examination of surgical or biopsy resection material, including 369 S-HSCR (short segment HSCR), 74 L-HSCR (long segment HSCR) and 59 TCA (total colonic aganglionosis). Controls were randomly recruited from the general population with no history of chronic constipation. We received approval for the study from Xinhua Hospital and Capital Institute of Pediatrics and obtained written informed consent from participants or their parents after the procedure had been fully explained. DNA was extracted using the QIAamp DNA Blood Midi Kit (Qiagen, Valencia, CA).

SNP-SNP interaction network analysis
In our present study, multifactor dimensionality reduction (MDR) analysis was employed to investigate the SNP-SNP interaction networks in regard to Hirschsprung disease. MDR software (version 3.0.2) was thus used to carry out the MDR analysis, and the risk factors were explored in the best model, which maximized both testing accuracy and cross-validation consistency (CVC) [35]. Specifically, MDR takes advantage of cross-validation by dividing the data into a training dataset (i.e. 9/10 of the data) and a testing dataset (i.e. the remaining 1/10 of the data) to derive estimates of cross-validation consistency and testing accuracy [35]. Moreover, GeneMANIA, a flexible user-friendly database, was used to further interrogate the gene-gene interaction networks and to perform a function prediction, and specifically, the interaction networks derived from GeneMANIA are based on multiple datasets, such as co-expression, genetic interactions and consolidated pathways datasets [36].

Statistical analysis
SHEsis (http://analysis.bio-x.cn/myAnalysis.php) was used to assess Hardy-Weinberg equilibrium (HWE), odds ratio (OR), 95% confidence interval (CI), allelic and genotypic association, and to study linkage disequilibrium (LD), allelic and haplotype distribution [37]. The false discovery rate (FDR) controlling procedure was used to correct the P values of genetic analysis [38]. All P values were two-tailed, and the significance level was set at P = 0.05. G*Power 3 was used to conduct the power calculation [39]. Plink software was employed to carry out the adjustment for age and gender factors in the genetic analysis [40].

AUTHOR CONTRIBUTIONS
YW, QJ, LL and WC conceived the study. YW, HC and ZX performed the experiments. YW, QJ, LL and WC contributed to data analysis. QJ, WW and BG collected the samples. YW, QJ and AC wrote the manuscript. LL and WC supervised the study. All authors approved the final manuscript.