Association of common variation in ADD3 and GPC1 with biliary atresia susceptibility

Biliary atresia (BA) is an idiopathic neonatal cholestatic disease. Recent genome-wide association study (GWAS) revealed that common variation of ADD3, GPC1, ARF6, and EFEMP1 gene was associated with BA susceptibility. We aimed to evaluate the association of these genes with BA in Chinese population. Twenty single nucleotide polymorphisms (SNPs) in these four genes were genotyped in 340 BA patients and 1,665 controls. Three SNPs in ADD3 were significantly associated with BA, and rs17095355 was the top SNP (PAllele = 3.23×10-6). Meta-analysis of published data and current data indicated that rs17095355 was associated with BA susceptibility in Asians and Caucasians. Three associated SNPs were expression quantitative trait loci (eQTL) for ADD3. Two GPC1 SNPs in high linkage disequilibrium (LD) showed nominal association with BA susceptibility (PAllele = 0.03 for rs6707262 and PAllele = 0.04 for rs6750380), and were eQTL of GPC1. Haplotype harboring these two SNPs almost reached the study-wide significance (P = 0.0035). No association for ARF6 and EFEMP1 was found with BA risk in the current population. Our study validated associations of ADD3 and GPC1 SNPs with BA risk in Chinese population and provided evidence of epistatic contributions of genetic factors to BA susceptibility.


INTRODUCTION
Biliary atresia (BA) is a devastating inflammatory and fibro-obliterative disease of the infant biliary tree involving extra-and intrahepatic bile ducts which invariably leads, if left untreated, to cholestasis and hepatic fibrosis even progresses to liver cirrhosis and eventually liver failure [1]. The most effective treatment of choice is palliative surgery (Kasai operation) and the majority of patients would still need liver transplantation later in life due to the progressive intrahepatic bile ducts injury [2]. The majority of BA (about 80 % of cases) occurs as an isolated defect without any associated disorders, and 10%-20% of patients with at least one major congenital malformation [3,4]. The occurrence of BA has geographical, seasonal and gender differences. The incidence rate of BA in western countries is about (0.5 to 0.8)/10,000, which is lower than Asians. The incidence is 1.5/10,000 in Taiwan, and about 1.1/10,000 in Japanese population [5,6]. BA exhibits a slight gender bias, with a female to male ratio about 1.25:1 [7]. It is likely to be a multifactorial disease, in that environmental and genetic interaction underlies its pathogenesis. The genetic basis of BA is quite complicated. It was found that the disease could be inherited in a dominant or recessive AGING pattern but more probably was a polygenic condition with incomplete penetrance, genetic heterogeneity and variable clinical manifestations [3,8]. In the past twenty years, a number of risk genes were found [9][10][11][12][13][14][15][16]. Recent genome-wide association studies (GWASs) revealed that variants in adducing-3 (ADD3), glypican-1 (GPC1), adenosine diphosphate-ribosylation factor-6 (ARF6) and epidermal growth factor-containing fibulinlike extracellular matrix protein 1 (EFEMP1) were associated with BA susceptibility [9,10,12,16].
A previous GWAS in Chinese population firstly identified a susceptibility locus for BA on 10q24.2 with rs17095355 as the lead single nucleotide polymorphism (SNP), which is located in the intergenic region between the X-prolyl aminopeptidase 1 (XPNPEP1) and ADD3 genes [9]. The association was then validated in Thai, Chinese and European population [11,[17][18][19][20][21]. Further study in model organism revealed that both xpnpep1 and add3a were expressed in the liver of developing zebrafish, only knockdown of add3a produced intrahepatic defects and decreased biliary function by activating Hedgehog signaling [22]. Chromosome 2q37 was identified as a potential susceptibility region for BA in a GWAS and continued fine-mapping indicated GPC1 as a susceptibility gene [10,23]. Disruption of gpc1 in zebrafish led to biliary defects for overactivation of Hedgehog signaling [23]. Two common SNPs in GPC1 were subsequently investigated in a Chinese case-control sample-set containing 134 cases and 618 controls, which found a significant association with rs2292832 and a marginal effect with rs3828336 [24]. A GWAS with 80 Caucasian BA cases and 2,818 controls found SNPs rs3126184 and rs10140366 in the 3′ flanking region of ARF6 were associated with BA risk [12]. Knockdown of the two zebrafish homologs, arf6a and arf6b, caused a sparse intrahepatic biliary network, several biliary epithelial cell defects, and poor bile excretion to the gall bladder [12]. EFEMP1 was found association with BA in a recent European-American population-based GWAS including 343 isolated BA patients and 1,716 controls, which was validated an independent European-American cohort including 156 patients with BA and 212 geneticallymatched controls [16]. RNA expression analysis and immunohistochemistry analysis demonstrated that expression of EFEMP1 was higher in BA patients than in controls [16].
With the aim to comprehensively investigate these newly identified susceptibility genes from recent GWASs, we conducted a case-control study in Chinese population consisting of 340 patients and 1,665 controls. Since ADD3 variants were repeatedly studied, we performed a meta-analysis for BA association with the top SNP rs17095355. We also explored the functional consequences of associated SNPs via bioinformatics methods.

Case-control association study
Detailed clinical information and biochemical indexes of 340 BA patients are shown in Table 1. A total of 340 cases and 1,665 controls were genotyped for 20 SNPs. Two SNPs (rs10140366 and rs2292832) were filtered out for failure in assays. Seven samples were excluded for further analysis for genotyping missing rates ≥ 5%. The genotypes of the remaining 18 SNPs were conformed to Hardy-Weinsberg equilibrium (HWE) (P > 0.05) and the minor allele frequencies (MAFs) were all above 0.01. The allele and genotype frequencies are shown in Table 2 and Table 3.
We further investigated whether ADD3 SNP haplotypes were associated with BA susceptibility. Three associated SNPs of ADD3 constructed a haplotype block. The frequency of haplotype rs17095355T -rs10509906G -rs2501577G in cases was significantly higher than that in controls (44% vs 36%, P = 4.86×10 -5 , OR = 1.42, 95% CI = 1.20-1.68; Table 4). Haplotype rs17095355C -rs10509906C -rs2501577A showed significant protective effect with P = 1.00×10 -4 (16% in cases vs 22% in controls; OR = 0.65, 95% CI = 0.52-0.81; Table 4).   Table 2). However, the two SNPs could not reach study-wide significance (0.05/18 = 0.0027). The genotype distribution of rs6707262 was nominally different between cases and controls (PGenotypic = 0.043; Table 3). Haplotype analysis revealed these two SNPs and an adjacent SNP (rs1316479) constructed a haplotype block, and haplotype rs1316479G -rs6750380G -rs6707262G almost reached the study-wide significance (P = 0.0035) ( Table 5). These two SNPs were in nearly perfect LD (r 2 = 0.98), suggesting that they represent a same signal ( Figure 1B). These data indicated common genetic variation of GPC1 contributed to BA susceptibility in Chinese population.    The previously associated ARF6 SNP rs3126184 showed no significance in our samples ( Table 2). The frequencies of rs3126184 allele T were 0.030 in cases and 0.037 in controls in current Chinese population. However, it was more frequent with 0.29 in cases and 0.13 in controls in Caucasian [12]. We found no associations of four previously reported risk SNPs of EFEMP1 with BA susceptibility in current samples. The allele frequencies in healthy controls of these four SNPs were different between current study and the European-American cohort, where the associations were firstly discovered [16]. But the effect direction of three SNPs was consistent with that in previous study (Supplementary Table 1).
Lastly, we investigated whether there was a cumulative genetic effect with respect to the disease risk for ADD3 SNP rs17095355 and GPC1 SNP rs6707262 ( Figure 3). The individuals can be divided into four classes according to the number of risk alleles that they carry ( Figure 3A). There is an increase in ORs for BA occurrence with the increasing number of risk alleles against the baseline group of individuals carrying no risk alleles. Those carrying four risk alleles were more than twice as likely to have BA (OR = 2.56, 95% CI = 1.23-5.32; Supplementary Table 2) compared with those of a two-factor model are associated with risk to BA best. In each cell, the left bar represents a positive score, and the right bar represents a negative score. High risk are represented by dark shading cells and low-risk cells by light shading. Rs17095355 was in ADD3 region and rs7577243 was in GPC1 region. a. The best model was referred to as the one with the maximum testing accuracy and maximum cross-validation consistency (CVC). GMDR: generalized multifactor dimensionality reduction; OR: odds ratio; CI: confidence interval. Rs17095355, rs7577243 and rs11125609 were on ADD3, GPC1 and EFEMP1, respectively.

AGING
carrying none. We then evaluated the discriminatory power of a genetic test based on these two susceptibility SNPs by calculating the area under the receiver operating characteristic (ROC) curve, and the area under the curve (AUC) was estimated to be 0.58 ( Figure 3B).

Meta-analysis
Literature searches and selection yielded 7 involved studies, which comprised 8 case-control studies [9,11,[16][17][18][19]21]. The study of Garcia-Barcelo MM, et al. included a GWAS stage and a replication stage in two independent samples [9], which were considered as two case-control studies in our meta-analysis (Table 7). Additionally, we included the data from the GWAS by Chen Y, et al [16] and the allele information of rs17095355 was obtained from the authors, which was imputed from the GWAS data with a info score of 0.998. The cases in the study of Tsai E.A et al [20] were part of samples from the study by Chen Y, et al, we therefore only included data from Chen Y, et al in the meta-analysis. Together with present study, a total of 9 case-control data consisting of 2,227 cases and 6859 controls was included in the meta-analysis ( Figure 4). The risk allele T of has a higher frequency in Asians than in Europeans. The significant associations were consistent among 9 studies, although heterogeneity was found (I 2 =66%, p value <0.01, Figure 4). Therefore, the pooled OR was 1.61 (95% CI = 1.

Functional annotation of associated SNPs
At ADD3 locus, three associated SNPs (rs17095355, rs10509906 and rs2501577) were located in the intron region of ADD3. Rs17095355 and rs2501577 fall within a strong enhancer activity region (Supplementary Table  3) and they all alter the sequences of DNase I hypersensitivity sites and transcription factor binding motifs annotated by Roadmap (Supplementary  Table 3). These three SNPs were expression quantitative trait loci (eQTLs) in multiple tissues from Genotype-Tissue Expression (GTEx) databases and were correlated with ADD3 expression in immune system tissues including spleen and whole blood, where was thought to be involved in the progress of BA (Supplementary Figure 1). Of note, the risk allele T of rs17095355 was significantly associated the increased level of ADD3 in spleen (P = 5.1×10 -13 , Supplementary  Figure 1).
Rs6750380 and rs6707262 at 5'upstream of GPC1 were located in a strong enhancer region as well as a site altering regulatory motifs and proteins bounding sites annotated by Roadmap (Supplementary Table 3). Rs6707262 was eQTL of GPC1 in testis (P = 4.6 ×10 -

Protein expression and epigenetic modification of associated genes
In silico analysis revealed that ADD3 had a medium expression level in liver and a high expression level in gallbladder (Supplementary Figures 4A and 5). GPC1 AGING was not expressed in adult liver and gallbladder tissues ( Supplementary Figures 4B and 5) ADD3 showed significant difference in expression levels and methylation status between fetal and adult liver, with an approximately 2-fold higher expression level in fetal liver [25]. Four CpG sites located at ADD3 gene region were differentially methylated when comparing the methylation patterns of the adult liver with the fetal liver [25].

The protein-protein interaction (PPI) and coexpression results
Hedgehog signaling is an important mechanism in the pathology of BA and liver development. PPI analysis showed GPC1, ARF6, and EFEMP1 gene interacted with Hedgehog pathway or related genes ( Figure 5). GPC1 was linked with Sonic Hedgehog (SHH) with experimentally determined evidence ( Figure 5). Experimentally determined evidence also demonstrated that ARF6 and EFEMP1 gene were interacted with cadherin 1 (CDH1), which was linked to Hedgehog pathway members glioma-associated oncogene homolog 1 (GLI1), SHH and smoothened (SMO) Figure 5. The protein-protein interaction (PPI) network based on STRING database of studied genes. The network is constructed for the four studies genes and Hedgehog pathway genes. The network nodes are proteins. The edges represent the predicted functional associations. An edge may be drawn with up to four different colored lines and these lines represent the existing associations that were predicted. A green line: neighborhood evidence; a blue line: cooccurrence evidence; a purple line: experimental evidence; a yellow line: textmining evidence; a black line: coexpression evidence.
( Figure 5). Although knockdown of add3 activated the Hedgehog pathway in zebrafish larvae, no recognized link between ADD3 and the Hedgehog pathway was found.

DISCUSSION
We performed association analysis for four BA susceptibility genes of discovered in recent GWASs. Our results validated that three ADD3 variants (rs17095355, rs10509906 and rs2501577), and two GPC1 variants (rs6750380 and rs6707262) were associated with BA susceptibility in Chinese population. Meta-analysis for rs17095355 association with BA further confirmed the association in Asian and Caucasian population. Associations of ARF6 and EFEMP1 SNPs were not replicated in current sampleset.
The 10q24.2 region encompassing ADD3 and XPNPEP1 genes was found association in a GWAS of Chinese population, and further fine-mapping of this region identified ADD3 as the susceptibility gene [9,17]. Morpholino antisense oligonucleotide (MO) knockdown targeting add3a in zebrafish, not xpnpep1, produced intrahepatic defects and decreased biliary function [22]. The risk allele T of the top SNP rs17095355 was found association with decreased level of ADD3 in BA liver tissues, but no such correlation was found for XPNPEP1 [17]. Rs17095355 was also found to act as an eQTL for ADD3 in whole blood and spleen from the GTEx database. These foundings indicated that ADD3 was the BA susceptibility gene at 10q24.2. The association between rs17095355 of ADD3 and BA was investigated repeatedly in multiple studies from different population [11,[18][19][20][21], and a meta-analysis comprising six case-control studies before 2015 has been conducted [26]. We incorporated the published data before 2015, the newly published data and our current data to perform a further meta-analysis. In Asian population, rs17095355 showed consistent significant association with BA [11,18,19,21]. Rs17095355 also showed significant association in European descent, but rs7099604 showed more significant association [20]. These evidences revealed ADD3 as a common susceptibility gene in Asian and Caucasian population. The risk allele T of rs17095355 was more frequent in Asian than in Europe decedents, which might contribute to the higher incidence of BA in Asian.
ADD3 encodes adducin-γ belonging to Adducin family. Adducins are heteromeric membrane skeletal proteins composed of different subunits referred to as adducin alpha, beta and gamma. Adducin-γ are ubiquitously AGING expressed and abundantly expressed in biliary epithelia [17]. Adducins are involved in the assembly of spectrinactin network in erythrocytes and at sites of cell-cell contact in epithelial tissues. Notably, the functional roles of adducins in remodeling of epithelial junctions during embryonic morphogenesis indicated that adducins might be involved in the biliary pathology in BA [27]. Morpholino-mediated knockdown of add3 activated the Hedgehog pathway in zebrafish larvae, providing a previously unrecognized link between ADD3 and the Hedgehog pathway [17]. It has long been recognized that BA is characterized by excessive Hedgehog pathway activity, which stimulated biliary epithelial-mesenchymal transitions (EMT) and might contribute to biliary dysmorphogenesis during liver development [28]. The underlying molecular mechanisms though which ADD3 regulates Hedgehog signaling needs further exploration.
Rare copy number variants and common variants of GPC1 both contributed to BA risk [10,23,24]. We genotyped ten tag SNPs in the current sample-set and confirmed GPC1 association with BA risk. Two new associated SNPs were identified (rs6750380 and rs6707262), which also had eQTL effects on GPC1. GPC1 encodes glypican-1, one of six members of the glypican family, which attach to the cell membrane by a glycosyl-phosphatidylinositol linkage. Previous study showed that glypican-1 was located in the apical membrane of cholangiocytes and had reduced levels in diseased liver from BA patients [23]. Knockdown of gpc1 in zebrafish led to developmental biliary defects resembling BA and Hedgehog activity was increased in the livers of gpc1 morphants [23]. Glypican-3 (GPC3) acted as a negative regulator of Hedgehog signaling, through interacting with high affinity with Hedgehog and competing with Patched for Hedgehog binding [29]. Together, these findings suggest GPC1 could act as an inhibitor for Hedgehog ligands via the similar mechanisms as GPC3.
A GWAS in Caucasian identified ARF6 as a susceptibility gene at 14q21.3 [12]. ARF6 shows a medium expression level in liver and gallbladder (Supplementary Figures 4C and 5). Knockdown of the two zebrafish homologs resembled the syndromes of BA, which indicated that arf6 was required in early biliary development [12]. The frequency of rs3126184 risk allele in Caucasian controls was 0.13, but only 0.037 in current controls. The association was not validated in our samples. Since only two reported SNPs were studied, we could not preclude the possibility that other variants of ARF6 were associated with BA risk. Another explanation for lack replication of the association might be the genetic heterogeneity, that ARF6 might be a Caucasian specific susceptibility gene. EFEMP1 mapping to chromosome 2p16, encodes epidermal growth factor-containing fibulin-like extracellular matrix protein 1, which is also known as Fibulin-3. Its main role is to maintain basement membrane stability and extracellular matrix integrity, which is implicated in cell proliferation and organogenesis [16,30,31]. EFEMP1 is also a major extracellular matrix protein involving in the biological process of fibrosis [32]. The expression level of EFEMP1 was higher in BA patients than in controls [16]. Together, these findings suggest a potential role for EFEMP1 in the pathogenesis of BA. A cluster of SNPs within EFEMP1 gene were significantly associated with BA susceptibility in a recent GWAS in Europeans [16]. Four tag SNPs in the current study did not reach the significance level, however, showed the same effect direction as in the original study [16]. Given the moderate effects of this locus, our sample was not large enough to detect the association. Therefore, further studies were needed to validate this association in other independent samples.
In summary, we confirmed association of variants in ADD3 and GPC1 with BA susceptibility in Chinese population. The interaction of SNPs in diseaseassociated genes contributed to BA susceptibility. Bioinformatics analysis revealed that the risk SNPs influenced the expression of susceptibility genes.

Subjects
A total of 340 unrelated patients were recruited. Diagnose of BA was based on clinical manifestations, laboratory tests, imaging examinations and ultimately confirmed by cholangiography. Patients with other associated congenital malformations were excluded from the study. Clinical information of patients was shown in Table 1. Totally, 1,665 unrelated healthy individuals without BA, other congenital diseases, autoimmune, or liver disease were enrolled as controls. All participants were biologically unrelated Chinese Han individuals and were recruited at Xinhua hospital affiliated to Shanghai Jiao Tong University School of Medicine from 2008 to 2018. Peripheral blood samples were collected in a standard EDTA tube for DNA extraction and all data was recorded anonymously. Genomic DNA was extracted from peripheral blood leukocytes using QIAamp DNA Blood Mini Kit according to the manufacturer's protocol (Qiagen, Hilden, Germany). Written informed consent was obtained from all participants or their parents. This study was conducted in accordance with the Declaration of Helsinki (version 2002) and was approved by the institution review board of Xinhua Hospital affiliated to Shanghai Jiao Tong University School of Medicine.

SNP selection
A GWAS in Chinese population revealed BA association with 10q24.2 region encompassing ADD3 and XPNPEP1 [9]. Subsequent fine-mapping indicated that a risk haplotype, consisting of five SNPs: rs17095355, rs10509906, rs2501577, rs6584970, and rs7086057, could capture the 10q24.2 risk alleles [17]. Among the five SNPs, rs2501577, rs6584970 and rs7086057 were in high LD (r 2 ≥ 0.98). Therefore, we select rs17095355, rs10509906 and rs2501577 for replication analysis. We selected 10 tag SNPs from South Han Chinese data in 1000 genomes project database to cover the common variation in GPC1 gene region. Rs2292832 failed in the assay. Two SNPs (rs3126184 and rs10140366) in perfect LD 3' upstream of ARF6 were reported association with BA in Caucasian children [12]. We genotyped these two SNPs in our samples, but rs10140366 failed in the assay. About 13 SNPs in high LD within EFEMP1 region on 2p16.1 were associated with BA susceptibility in a European-American cohort [16]. We selected 4 tag SNPs including the top SNP (rs10865291) for replication.

SNP genotyping
Genotyping was performed using the Fluidigm 96.96 Dynamic Array IFCs (Fluidigm, San Francisco, CA, United States) [33]. Cases and controls were plated out in sets of 96 samples and combined into 384-well arrays for genotyping. Polymerase chain reaction (PCR) was performed in a 5-µl reaction and cycling conditions were set using the standard procedure according to the manufacturer's protocol. To obtain genotype calls, we analyzed the data using EP1 SNP Genotyping Analysis software. The software defined the genotype of each sample based on the relative fluorescence intensities.

Functional annotation
We first investigated the functional consequences of the associated SNP by checking HaploRegv4.1 database. To examine whether the associated SNPs were eQTL, we made inquiries in GTEx Analysis Release V8 (dbGap Accession phs000424.v8.p2) [34]. The GTEx project collected and analyzed multiple human tissues from donors who were densely genotyped to assess genetic variation within their genomes. By analyzing global RNA expression within individual tissues and treating the expression levels of genes as quantitative traits, variations in genes expression that are highly correlated with genetic variation can be identified as eQTL.

Meta-analysis
Since 2010 when 10q24.2 region was implicated association with BA in, rs17095355 was repeatedly genotyped in the following studies, thus we performed a meta-analysis of rs17095355 association with BA risk. In order to find eligible studies, we searched PubMed using combinations of the following terms: "ADD3" or "adducin 3" or "XPNPEP1" or "X-prolyl aminopeptidase 1" and "biliary atresia" and "association". We also searched the reference list of review articles and lists of publications of researchers working in this field. The included data covered all English-language publications up to October 2019. Meta-analysis was conducted using the Meta package in R (http://cran.r-project.org/web/packages/meta/index. html) [35]. The I 2 was calculated to quantify the magnitude of between-study heterogeneity and the Cochrane Q statistic was used to determine significance for heterogeneity. An I 2 of 25%, 50%, and 75% represents low, medium, and large heterogeneity, respectively.

In silico protein expression and epigenetic analysis
We searched for the expression pattern of studied genes in THE HUMAN PROTEIN ATLAS (https://www. proteinatlas.org/). The immunohistochemistry results in liver and gallbladder tissues were extracted. EWAS Catalog β (http://www.ewascatalog.org/) was used as a lookup for epigenetic modificaton of studied genes.

PPI network construction
We explored PPI using STRING database (http://stringdb.org/) [36]. Four studied genes (ADD3, GPC1, ARF6, and EFEMP1) and Hedgehog pathway genes were used to query STRING database. The PPI relationships were analyzed on the STRING database with the required confidence (combined score) > 0.4 as the threshold. After the PPIs were searched, the PPI network was constructed on STRING website.

Statistical analysis
Quality control was performed using PLINK 1.09 [37]. HWE of each SNP in both case and control groups was tested. Four genetic models, including the allelic, additive, dominant and recessive model, together with a genotypic association test (2df test) were used to analyze the association for each SNP using PLINK 1.09 [37]. We calculated per allele OR and 95%CI. We calculated LD between SNPs and constructed haplotype block using Haploview4.2 [38]. Haplotype phasing was performed using SHAPEIT and haplotype association was tested using R package [39]. Conditional logistic analysis was performed to find additional markers with independent effect by adding the top associated markers as covariates in logistic regression. The study-wide significance threshold for SNP association analysis is AGING P = 0.027 (0.05/18). Gene-gene interactions were investigated using GMDR software Beta 0.9 [40]. ADD3 SNP rs17095355 and GPC1 SNP rs6707262 were used to build the risk assessment model. The genotypes of each SNP were coded as 0, 1, or 2 indicating the number of risk alleles in one individual. The cumulative genetic risk score of each individual is the sum of rsik alleles from the two SNPs (score range, 0 -4). To test the prediction capability of the model, we generated the ROC curve and calculated the AUC using the pROC R package [41].

Supplementary Tables
Supplementary Table 1