Genome-wide association study identiﬁes 8p21.3 associated with persistent hepatitis B virus infection among Chinese

Hepatitis B virus (HBV) infection is a common infectious disease. Here we perform a genome-wide association study (GWAS) among Chinese populations to identify novel genetic loci involved in persistent HBV infection. GWAS scan is performed in 1,251 persistently HBV infected subjects (PIs, cases) and 1,057 spontaneously recovered subjects (SRs, controls), followed by replications in four independent populations totally consisting of 3,905 PIs and 3,356 SRs. We identify a novel locus at 8p21.3 (index rs7000921, odds ratio ¼ 0.78, P ¼ 3.2 (cid:2) 10 (cid:3) 12 ). Furthermore, we identify signiﬁcant expression quantitative trait locus associations for INTS10 gene at 8p21.3. We demonstrate that INST10 suppresses HBV replication via IRF3 in liver cells. In clinical plasma samples, we conﬁrm that INST10 levels are signiﬁcantly decreased in PIs compared with SRs, and negatively correlated with the HBV load. These ﬁndings highlight a novel antiviral gene INTS10 at 8p21.3 in the clearance of HBV infection.

H epatitis B virus (HBV) infection is one of the major infectious diseases with 4250 million chronic carriers worldwide, causing a broad spectrum of liver diseases ranging from asymptomatic carrier, fulminant hepatitis, chronic hepatitis and liver cirrhosis to hepatocellular carcinoma (HCC) 1 . There is an urgent health care need to understand and control chronic HBV infection. Persistent HBV infection or HBV clearance has been considered a multifactorial and polygenic event with viral, environmental and genetic components 2,3 . The segregation analyses and twin studies strongly supported the roles of host genetic factors in determining the persistence of HBV infection 4,5 . Recently, several genome-wide association studies (GWASs) have identified single nucleotide polymorphisms (SNPs) at eight loci linking genetic susceptibility to persistent HBV infection in populations of Asia ancestry, including HLA-DP (index rs3077 and rs9277535), HLA-DQ (rs2856718 and rs7453920), HLA-C (rs3130542), EHMT2 (rs652888), TCF19 (rs1419881), CFB (rs12614) and two non-HLA loci UBE2L3 (rs4821116) and CD40 (rs1883832) (refs 6-10). However, in some of these studies, due to relatively small sample size or the unknown history of HBV exposure in the control subjects, the power to detect the low-penetrance loci with modest effects could decrease dramatically. Furthermore, the susceptibility to infectious diseases is considered to be determined at different functional levels 11 , suggesting additional genetic factors remain to be discovered.
To identify new loci conferring susceptibility to persistent HBV infection among Chinese, here we conduct a GWAS consisting of 1,251 persistently HBV infected subjects (persistently infected (PIs); cases) and 1,057 spontaneously recovered subjects (spontaneously recovered (SRs); controls), followed by validation of top candidate SNPs in four independent sample sets totally including 3,905 PIs and 3,356 SRs. The study confirms previously associated genetic loci while discovering a novel protective locus at 8p21.3 (index SNP rs7000921). By expression quantitative trait locus (eQTL) analyses and functional studies, we further demonstrates the nearby gene integrator complex subunit 10 (INTS10) at 8p21.3 suppresses HBV replication in an interferon regulatory factor 3 (IRF3)-dependent manner in vitro. In summary, the GWAS identifies a novel antiviral gene INTS10 at 8p21.3 in the clearance of HBV infection.

Results
Genome-wide association analyses. To detect novel loci conferring susceptibility to persistent HBV infection, we carried out a two-stage GWAS ( Supplementary Fig. 1). In the discovery GWAS stage, we used genotypes from 12,027 individuals by various genotyping platforms providing genome-wide coverage (Table 1 and Supplementary Note) [12][13][14][15] . With the plasma/serum of these subjects available, we determined who of them were PIs (cases) or SRs (controls) by screening for hepatitis B surface antigen (HBsAg), and antibodies against HBsAg (anti-HBs) and hepatitis B core antigen (anti-HBc). Totally, 1,251 cases and 1,057 controls were involved in the GWAS stage, all of whom are of Chinese ancestry recruited from Guangxi, Guangdong and Jiangsu provinces, respectively (Table 1, Supplementary Table 1a and Supplementary Note). In the replication stage, four independent sample sets of Chinese ancestry that were recruited from Jiangsu, Guangxi, Guangdong and Beijing provinces, respectively, were included (Supplementary Note). With the same sample inclusion and exclusion criteria as those used in the discovery GWAS stage, the replication stage totally consisted of 3,905 cases and 3,356 controls (Table 1 and Supplementary Table 1a) 16,17 .
To extend the coverage to the genomic region in the GWAS stage, we used genotypes of autosomal SNPs that passed strict quality checks to impute genotypes of SNPs across the chromosomes for all subjects (Methods section and Supplementary Table 2). We performed three rounds of imputation using the data from the HapMap project phase II, HapMap project phase III and the 1000 Genomes Project as references, respectively, and generated genotypes of 2,177,782, 1,059,015 and 4,494,311 SNPs, respectively (Supplementary Table 3). To assess the accuracy of genotyping and imputation, we resequenced a B127-Kb genomic region at 1p36. 22 in 274 subjects randomly selected from the GWAS stage (Supplementary Note). Excellent concordance between the array genotyping and sequencing was observed in these individuals (98.6%; Po2.2 Â 10 À 16 , Kappa test; Supplementary Table 4). A high consistency between the imputation and sequencing was also observed (Pearson's correlation r ¼ 0.94, Po2.2 Â 10 À 16 ; Supplementary Fig. 2a). Moreover, we noted that the SNPs with high imputation quality (imputation r 2 40.8) showed a higher consistency between the imputation and sequencing than those with low imputation quality (Pearson's correlation r ¼ 0.95 and 0.86, respectively; Supplementary Fig. 2b,c).
Having shown the validity of array genotyping and imputation data in the GWAS stage, we then carried out genotype-phenotype association analyses using non-integer allele numbers in logistic regression model, with adjustment for age, sex and principal components-based correction for population stratification (Methods section and Supplementary Fig. 3). A quantilequantile plot showed a good match between the distributions of observed P values and those expected by chance (inflation factor l ¼ 1.05; Supplementary Fig. 4), indicating minimal overall inflation of the genome-wide statistical results.
Several previously reported SNPs were replicated. Recent GWASs have identified a number of SNPs that were significantly associated with the risk of persistent HBV infection. In this study, we confirmed the genetic effects of HLA-DP (index rs9277535, P ¼ 3.8 Â 10 À 6 ; and rs3077, P ¼ 2.3 Â 10 À 3 ), HLA-DQ (rs2856718, P ¼ 1.8 Â 10 À 3 ; and rs7453920, P ¼ 5.5 Â 10 À 6 ), CFB (rs12614, P ¼ 4.0 Â 10 À 3 ) and CD40 (rs1883832, P ¼ 6.9 Â 10 À 3 ) (Supplementary Table 5), which have been identified in previous GWASs (refs 6,7,10). However, other four SNPs (rs652888, rs1419881, rs3130542 and rs4821116) in or near EHMT2, TCF19, HLA-C and UBE2L3 loci 8,9 failed to be replicated in this study (all P40.05; Supplementary Table 5). These results were unlikely to be caused by the error of imputation, as these four SNPs were either directly genotyped or imputed with high imputation quality (imputation r 2 40.96). We also reviewed the previous candidate gene-based association studies of persistent HBV infection. In addition to the SNPs in HLA-DP and HLA-DQ, the SNPs in the microRNA gene MIR219A1 at 6p21. 32 (ref. 18) were also be replicated (P ¼ 2.6 Â 10 À 6 and 1.3 Â 10 À 6 for rs421446 and rs107822, respectively; Supplementary Table 6). However, the other previously reported SNPs did not show any consistent associations in this study (Supplementary Table 6). These inconsistent associations between our study and the previous studies may be due to the different study design or racial diversity.
Recent GWASs have also identified several SNPs that are associated with HBV-related liver phenotypes. Among those SNPs, several ones in HLA-DP and HLA-DQ that were significantly associated with hepatitis B vaccine response or HBV-related HCC also showed suggestive associations with persistent HBV infection (Supplementary Table 8), reflecting shared genetic risk factors among the HBV-related phenotypes. However, all the other SNPs showed no associations with persistent HBV infection in our GWAS data (Supplementary Table 8), suggesting that the molecular mechanisms among these phenotypes are largely different.
A new susceptibility locus at 8p21.3 was identified. In addition to the previously reported SNPs in HLA-DP, HLA-DQ and MIR219A1, seventy-two loci showed significant associations with Pr1 Â 10 À 4 in the discovery GWAS stage in this study. We then selected all of these top 72 signals for replication (Supplementary  Data 1 and Supplementary Table 9; Methods section) in an independent sample set (replication stage 1, Jiangsu population). Of these 72 tested SNPs, 6 SNPs showed significant associations in the same direction as observed in the GWAS stage (Supplementary Data 1). These 6 SNPs were further genotyped in another sample set (replication stage 2, Guangxi population), and only rs7000921 at 8p21.3 were replicated (Supplementary Data 1). Consistently, rs7000921 showed evidence of association in replication stage 3 (Guangdong population) and stage 4 (Beijing population; Supplementary Data 1). In the combined analyses, rs7000921 (odds ratio (OR) ¼ 0.78, P meta ¼ 3.2 Â 10 À 12 ) reached genome-wide significance for association with persistent HBV infection (Fig. 1a, Table 2 and Supplementary Fig. 5). No evidence of heterogeneity for OR values of rs7000921 was observed among all these sample sets (P heterogeneity ¼ 0.29; Table 2).
We further investigated the effect of rs7000921 on persistent HBV infection using stratification by sex and age. In the pooled case-control samples, we found no appreciable variation of the effects across the subgroups stratified by age or sex for rs7000921 (P heterogeneity ¼ 0.088 and 0.26, respectively; Supplementary Table 10). The interaction effects between rs7000921 and viral factors (for example, HBV genotypes and mutations, and viral load) were not assessed because these data were not fully available in our samples. Therefore, the possibility that the association signals detected by rs7000921 reflect some other aspects of disease biology related to persistent HBV infection risk cannot be completely ruled out.
INTS10 was identified as the causative gene at 8p21.3. The SNP rs7000921 is located at intergenic region on chromosome 8p21.3. Six genes (CSGALNACT1, INST10, LPL, SLC18A1, ATP6V1B2 and LZTS1) are located within 1 Mb from this SNP (Fig. 1a). To identify potentially causative gene(s) at 8p21.3, we performed eQTL analyses based on liver tissues from 31 patients with persistent HBV infection (Methods section). We found that the protective minor allele C of rs7000921 was significantly associated with elevated transcript levels of INST10 (P ¼ 6.8 Â 10 À 3 ; Fig. 1b and Supplementary Data 2). This liver eQTL finding was then replicated in an independent sample set of 88 human liver tissues (P ¼ 3.1 Â 10 À 3 ; Fig. 1c, Supplementary Data 2 and Methods section) 20,21 . In these two sample sets, the expression of INST10 in protective allele carriers (TC or CC of rs7000921) showing 22-31% elevation compared with that in risk allele carriers (TT; Fig. 1b,c). The associations remained significant even after Bonferroni correction for multiple comparisons. When these two sample sets were pooled together, we achieved a more significant eQTL signal (Fisher's combined P ¼ 2.5 Â 10 À 4 ). No significant eQTL signals were found between the rs7000921 and the other five genes at 8p21.3. Taken together, these results suggest a potential role for INTS10 in persistent HBV infection. However, the allele-specific changes of INST10 expression in liver tissues were not seen in lymphocytes of HapMap populations, suggesting that the underlying regulatory mechanism is tissue-specific.
To investigate candidate causative variants, we performed functional annotation for the genetic variants that are tagged by the index SNP rs7000921 (r 2 40.7) on the basis of publically available data sets or tools (Supplementary Note and  Supplementary Table 11). All the SNPs highly correlated with rs7000921 are located at intergenic regions, of which rs11991803 (r 2 ¼ 0.739) and rs4922214 (r 2 ¼ 0.729) are in conserved regions predicted to have high regulatory potential scores (Supplementary Table 11). The eQTL analyses of rs11991803 and rs4922214 showed genotype-specific expression of INST10, similar to the results of rs7000921 ( Supplementary Fig. 6). We further checked the data from the Encyclopedia of DNA Elements (ENCODE) database, and found that the rs11991803 was within a transcriptional repressor CCCTC-binding factor-binding site detected in multiple cell types including the human hepatoma cell line HepG2, suggesting that this variant might be involved in gene regulation ( Supplementary Fig. 7). Taken together, these observations suggest that the causative variants at 8p21. Liver tissues (n = 85)  INTS10 suppresses HBV replication. INTS10 is a subunit of the integrator complex, which can interact with RNA polymerase II to mediate 3 0 end processing of small nuclear RNAs U1 and U2, the core components of spliceosome [22][23][24] . In addition, the integrator complex mediates transcriptional initiation, pause release and transcriptional termination at diverse classes of gene targets, including host small nuclear RNAs and coding genes and viral microRNAs [25][26][27] . INTS10 is expressed in a wide range of tissue types including in liver tissues, according to the RNA-Seq Atlas database 28 . However, the specific roles of INTS10 in diseases, for example, in persistent HBV infection, remain unclear.
To investigate whether the INTS10 plays a role on HBV replication, we used in vitro cell culture assay systems. The immortalized human hepatocyte cell line L02 was transfected with pAAV-HBV1.2 vectors, together with either pLV-EGFP-INTS10 or pLV-EGFP control vectors (Fig. 2a) (Fig. 2j-r), and in human hepatoma cell line HepG2 which was co-transfected with the pAAV-HBV1.2 vectors (Supplementary Fig. 8). Taken together, these results suggest that INST10 plays a role in suppressing HBV replication in vitro.

INTS10 suppresses HBV replication via IRF3-dependent manner.
We then sought to explore the underlying mechanisms by which INTS10 suppresses HBV replication by analysing mRNA expression profiles of liver tissues from 31 HBV carriers (Supplementary Note). Comparing samples with high INTS10 expression to samples with low INTS10 expression, we identified 402 differentially expressed genes (false discovery rate Q valueo0.01 and fold change41.2; Supplementary Data 3a) in determining biological pathways that are altered after INTS10 dysregulation. Intriguingly, we observed significant enrichment and activation of the spliceosome (P nominal ¼ 1.5 Â 10 À 5 , ranks the first) and the retinoic acid-inducible gene-I-like receptor (RLR) signalling pathway (P nominal ¼ 1.8 Â 10 À 3 , ranks the second; Supplementary Data 3b) in samples with high INTS10 expression. Given the important roles of integrator complex in spliceosome, the enrichment of term spliceosome may reflect the intrinsic physiologic function of INTS10. Notably, however, the RLR members such as retinoic acid-inducible gene-I (RIG-I) and melanoma differentiation-associated gene 5 (MDA5) have been shown to sense the HBV and activate innate immune signalling in hepatocytes to suppress virus replication [29][30][31] . To determine whether the RLR signalling pathway was regulated by INTS10 in other independent samples, we performed similar analyses in two data sets from the Gene Expression Omnibus (GEO) database (accession number GSE25097 and GSE22058), which contain 289 and 96 liver tissues, respectively (Supplementary Data 3c,d). Again, the term spliceosome ranked the first in both data sets (P nominal ¼ 1.0 Â 10 À 8 and 2.6 Â 10 À 3 , respectively), and the RLR-related pathway ranked the third (P nominal ¼ 2.6 Â 10 À 2 ) and ninth (P nominal ¼ 0.20), respectively (Supplementary Data 3b). Taken together, these results suggest that INST10 may be involved in inhibition of HBV replication through the RLR pathway.
Binding of RLRs to virus-derived nucleic acids activates the downstream signalling pathways in a manner dependent on the adaptor protein mitochondrial antiviral signalling protein (also known as IPS-1, VISA or Cardif), leading to the activation of the IRF3 and NF-kB and the subsequent production of type I interferons (IFNs, including IFN-a and IFN-b) and type III IFNs (that is, IFN-l, including IFNL1 (also known as IL29), IFNL2 (IL28A) and IFNL3 (IL28B)) and inflammatory cytokines 32 . Thus, we examined whether the HepG2.2.15 cells transfected with the INTS10 expression plasmid could activate the IRF3 and NF-kB. We found that overexpression of INTS10 could increase IRF3 phosphorylation (p-IRF3), whereas the NF-kB could not be activated (Fig. 3a). Consistent with these findings, knockdown of INTS10 by siRNAs led to significantly decreased levels of p-IRF3, whereas not influencing the activity of NF-kB (Fig. 3b). Furthermore, we found that overexpression of INTS10 could potently activate IFN-stimulated response element (ISRE) in reporter assays (Po0.01; Fig. 3c) and elevate mRNA levels of type III IFNs (IFNL1 and IFNL2/3; Po0.05; Fig. 3d and Supplementary Table 12), but not influence the type I IFNs. Consistent with these findings, knockdown of INTS10 significantly reduced the activity of ISRE reporter and mRNA levels of IFNLN1 and IFNLN2/3 (Fig. 3e,f). To ensure that these observations could be applied to other types of hepatocytes, we then did the same experiments in L02 and HepG2 cells co-transfected with HBV1.2 vectors and obtained identical results ( Supplementary Figs 9 and 10). Next, we investigated whether the IRF3 pathway is required for the INTS10-elicited immunity against HBV infection. Indeed, we found that the activation of ISRE reporter, elevation of mRNA levels of type III IFNs and the reduction of HBV markers by enforced INTS10 expression were weakened when cells were transfected with siRNAs targeting IRF3 (Fig. 3g-m and Supplementary Fig. 11). Accordingly, in the liver tissues of patients persistently infected with HBV, we observed that the protein levels of INTS10 were positively correlated with those of p-IRF3 (r ¼ 0.38, P ¼ 0.015), but not p-p65 (Fig. 4a-c and Supplementary Table 13). Taken together, these results suggest that INTS10 suppresses HBV replication in an IRF3-dependent manner.
INTS10 correlates with the persistence of HBV infection. To further validate the roles of INTS10 in facilitating HBV clearance, we investigated the plasma INTS10 from subjects persistently infected HBV or those spontaneously recovered from HBV infection. Consistent with the eQTL result in liver tissues, we observed elevated INTS10 protein levels in the plasma of rs7000921 C allele carriers (P ¼ 0.020, unpaired t-test; Supplementary Fig. 12). Furthermore, the levels of plasma INTS10 in 216 PIs were significantly lower than those in 80 SRs (P ¼ 2.0 Â 10 À 19 , fold change ¼ 2.0; Fig. 4d and Supplementary  Table 1b). In addition, we found significantly negative correlation between the INTS10 levels and HBV DNA load in the plasma of PIs with positive HBeAg (Pearson correlation coefficient r ¼ À 0.41, P ¼ 2.5 Â 10 À 3 ) and those with negative HBeAg (r ¼ À 0.17, P ¼ 0.028; Fig. 4e). Taken together, these results further support our genetic and functional findings, indicating that insufficiency of INTS10 may contribute to the persistence of HBV infection.

Discussion
To date, hundreds of GWASs have been performed to investigate the genetic susceptibilities to common diseases, but very few of them have been re-used since their initial publication. In the present study, by using previously reported GWAS data among Chinese and re-examining phenotypes related to the HBV infection, we obtained a relatively large case-control population (totally including 1,251 PIs (cases) and 1,057 SRs (controls)) to perform a GWAS for the persistence of HBV infection. Thus, enough statistical power, 490% at significance level of 0.01, was provided to detect an allele with a minor allele frequency (MAF) of 0.20 that confers an additive 1.2-fold effect on disease risk ( Supplementary Fig. 13). Through replication studies among four independent populations (totally consisting of 3,905 PIs and 3,356 SRs), we identified a novel association signal at 8p21.3 (index SNP rs7000921). Our study demonstrates that novel insights into disease biology can be obtained by re-using previously published GWAS data sets. In a separate study, we genotyped the rs7000921 in 689 Chinese subjects without information on HBV infection status (Supplementary Table 1c). The frequencies of the rs7000921[C] allele in the SRs (0.283) were similar to that in this random control set (0.266; Supplementary Table 14). To gain insight into the geographic frequency distribution of rs7000921, we compared the SRs with the 14 populations from the 1000 Genomes Project. The frequency of the rs7000921[C] allele in the SRs was similar to that of Asians (0.210-0.309, P ¼ 0.20), but significantly higher than that of Europeans (0.107-0.218, P ¼ 7.8 Â 10 À 11 ) and significantly less than that of Africans (0.484-0.691, P ¼ 1.4 Â 10 À 51 ; Supplementary Fig. 14 and Supplementary Table 14). It remains to be determined whether these differences between ethnic groups influence susceptibility to the persistence of HBV infection.
The rs7000921 lies in a noncoding region at 8p21.3. No coding SNPs show high LD with the rs7000921 (Supplementary Table 11). As noncoding SNPs may alter gene expression, we used eQTL analyses to explore whether the rs7000921 or SNPs tagged by it are cis-acting regulators of nearby gene(s) in human liver. Indeed, we found that the eQTL associations with were statistically significant in two independent sample sets of liver tissues, strongly suggesting a causative role of altered INTS10 expression on phenotypes associated with rs7000921. However, it is possible that eQTLs exist for risk SNPs with genes other than INTS10, either within this region or regulated more distally. Additionally, eQTL analyses are complicated by tissue heterogeneity due to variation in genomic copy number, methylation and gene expression. Therefore, caution needs to be applied when interpreting eQTL data. Additional analyses in larger sample sizes and in more cell or tissue types relevant to aetiology of persistent HBV infection will be needed to confirm the significance of INTS10 as susceptibility gene for persistence of HBV infection at 8p21.3. INTS10 is biologically plausible for susceptibility to persistent HBV infection. Although the specific roles of INTS10 in persistent HBV infection have never been reported before, functional assays in this study indicated that INTS10 can significantly decrease levels of HBV markers in liver cells ( Fig. 2 and Supplementary Fig. 8). Consistent with the roles of INTS10 in suppressing HBV replication, the levels of plasma INTS10 in PIs were significantly lower than those in SRs (Fig. 4d), and the INTS10 levels were significantly negatively correlated with the HBV DNA levels in the plasma of PIs (Fig. 4e). Taken together, these findings suggest that insufficiency of INTS10 is biologically plausible for an increased risk of HBV infection.
Previous studies have revealed the roles of RLR pathway in control of HBV infection [29][30][31][33][34][35][36][37] . For instance, the RNA sensor RIG-I has been reported to dually function as an innate sensor of the 5 0 -e region of HBV pregenomic RNAs to induce type III IFNs but not type I IFNs, and as a direct antiviral factor to counteract the interaction of HBV polymerase with the 5 0 -e region of HBV, which consistently suppressed HBV replication 29 . Meanwhile, the HBV X protein can target RLR pathway to suppress virustriggered IRF3 activation and IFN-b induction [33][34][35][36] . Consistent with these previous studies, our study for the first time demonstrated that the INTS10 promotes HBV clearance through activation of IRF3 in liver cell models. Further studies are warranted to identify the interaction protein(s), and elucidate the precise mechanisms of INTS10 against HBV infection in hepatocytes, and/or other types of cell including dendritic cell subsets.
In summary, our GWAS replicates the previously identified 6p21.32, 6p21. 33  Case-control populations. Subjects who had been positive for both HBsAg and anti-HBc immunoglobulin G for at least 6 months were defined as PIs (cases). Those who were negative for HBsAg and positive for both anti-HBs and anti-HBc immunoglobulin G were defined as SRs (controls). In the discovery GWAS stage, the genotype data were derived from several previously published GWASs and in-house data [12][13][14][15] . By screening for HBV markers in the plasma of these subjects whose plasma samples were available, we determined 1,251 cases and 1,057 controls (Supplementary Table 1a). With the same sample inclusion and exclusion criteria as those used in the discovery GWAS stage, we totally determined 3,905 cases and 3,356 controls in the replication stage (Supplementary Table 1a) (refs 16,17).
Random controls without information on HBV infection. This population consists of 689 subjects, which was used to evaluate the frequency of the protective rs7000921 allele [C] in naïve controls in China. All the subjects were unrelated ethnic adult Chinese randomly recruited from Guangxi province, and were not screened for HBV markers in previous studies. The male/female ratio and the mean age (s.d.) of these random controls are 1.4 (403/286) and 54.9 (11.8) years old, respectively (Supplementary Table 1b).
Protein levels of INTS10, p-p65 and p-IRF3 were measured by immunohistochemistry (IHC) in 40 non-tumour liver tissues of patients with HBV-related HCC (Supplementary Table 13) collected from the Jinling Hospital (Nanjing City, China). The levels of plasma INTS10 were detected by enzymelinked immunosorbent assays in 216 PIs and 80 SRs, who were randomly selected from the Guangdong population in the replication stage (Supplementary Table 1c).
Quality controls in the GWAS stage. We performed stringent quality controls on both samples and SNPs to ensure subsequent robust association tests. Samples were removed if they (i) had an overall genotyping rate ofo90%; (ii) showed sex discrepancies; (iii) showed unexpected duplicates or relatives (PI_HAT40.025) or (iv) were identified as outliers. SNPs were excluded if they had (i) a call rate ofo90%; (ii) a MAF of o0.05; or (iii) a P value of o1 Â 10 À 4 in a Hardy-Weinberg equilibrium test among controls. After quality controls, a total of 1,251 cases and 1,057 controls were remained; and, 616,583, 286,713, 694,784, 590,809 and 441,776 SNPs, respectively, were remained for GWAS population 1, 2, 3, 4 and 5, respectively, for subsequent analyses (Supplementary Table 2).

SNP imputation.
To increase the number of overlapping SNPs among data sets and generate more genotypes in the discovery GWAS stage, we performed imputation on the GWAS data sets using a Markov Chain based haplotyper (MACH; version 1.0.16) 38 with haplotypes derived from genotypes of samples of Asian ancestry in the HapMap phase II, the HapMap phase III and the 1000 Genomes Project. Before imputation, all the SNPs were checked for strand inconsistencies. Then, data were imputed by a two-stage design for all the data sets. The first stage generated error and crossover maps as parameter estimates, which were used to generate maximum likelihood estimates of allele numbers per SNP on the basis of reference haplotypes for the data sets during the second stage of the imputation. The imputation were performed for each data set separately. Cases and controls within each data set were imputed together. For detailed descriptions, see Supplementary Note.
Assessment of accuracy of array genotyping and imputation. To repeat the array genotyping and imputation results, we randomly selected 274 samples from the GWAS population 2, and then resequenced a B127-Kb non-repeat genomic region at 1p36.22 locus in them using deep sequencing. BWA (v0.5.9) was used to aligned high quality reads to human genome (hg19 build) with default parameters. SAMtools (v0.1.8) was used to remove PCR duplicates. When accounting for the array genotyping accuracy, Kappa test was used to evaluate the agreement between the genotypes determined by array genotyping and those determined by deep sequencing. When accounting for the imputation accuracy, we calculated the Pearson's correlation coefficients between the 'allele dosages' from the imputed data based on the 1000 Genomes Project and the non-reference allele proportions of SNPs in the deep sequencing. For detailed descriptions, see Supplementary Note.
Genome-wide genetic association analyses in the GWAS stage. We combined all five GWAS subgroups to perform joint association analyses. Population substructure was characterized using principal component analyses as implemented in EIGENSTRAT (version 3.0) (ref. 39). We did genome-wide association analyses at every SNP using MACH2DAT (ref. 40) by use of imputation results based on HapMap phase II, HapMap phase III and 1000 Genomes Project data, respectively. To account for imputation uncertainty, we used 'allele dosages' as a primary predictor of persistent HBV infection in logistic regression models adjusted for age, sex and admixture principal components. Haploview (v4.2) and R package were used to generate Manhattan plot of -log 10 (P) and quantile-quantile plot, respectively. We also performed metaanalysis in the GWAS stage. For detailed descriptions, see Supplementary Note. remained. From these papers, the SNPs that were reported to be significantly associated with the chronic or persistent HBV infection were selected. Then the SNPs with MAFo0.01 in East Asian population according to the 1000 Genomes Project were removed. Finally, a total of 69 SNPs were remained for reviewing the consistency of associations in this study (Supplementary Table 6).
Text mining of GWASs on HBV-related phenotypes. We searched the studies based on GWAS Catalog (http://www.genome.gov/26525384). Eight studies on HBV-related phenotypes were selected. We reviewed these papers and selected 25 significantly associated SNPs for reviewing the consistency of associations in this study (Supplementary Table 8).
HLA alleles analyses. HLA alleles were predicted from dense SNPs genotypes using the R package HIBAG (http://cran.r-project.org/web/packages/HIBAG/ index.html). We selected a threshold of 0.5 as a value that has modest effects on both call rate and accuracy. ORs and 95% confidence intervals were calculated in logistic regression model.
HLA-DRB1*1301 and*1302 alleles (corresponding to HLA-DR13) were reported to have a protective effect against persistent HBV infection in different populations 41,42 . However, HLA-DR13 were failed to be predicted with the R package HIBAG in this study. We then selected rs11752643 as a proxy of HLA-DR13 (rs11752643 is closely linked with HLA-DR13, with r 2 ¼ 0.83 in healthy Japanese samples) 6 . We found that rs11752643 showed marginal association with persistent chronic HBV infection (P ¼ 0.089).
Selection of SNPs for the replication studies. In the joint analyses, a locus not reported previously to conferring susceptibility to persistent HBV infection was chosen for replication if it had a SNP with a P value r1.0 Â 10 À 4 in the GWAS stage. SNPs showing independent association (Pr0.05) by conditional analyses with adjustment of the most significant SNP within each region went forward to replication stage (Supplementary Note). These steps led to the identification of 72 candidate SNPs forward to the replication stage. In the meta-analysis, the same steps were performed. This led to the identification of 43 candidate SNPs, all of which were overlapped with the 72 most significant SNPs in joint analyses.
Genotyping and quality controls in the replication stage. In the replication stage 1, one SNP within MHC region was genotyped using TaqMan assays, and the remaining 71 SNPs were genotyped using Sequenom assays with five failed. Among the 67 successfully genotyped SNPs, six SNPs survived (Po0.05 and with effects in the same direction as in the GWAS stage) and went forward to the replication stage 2. In this stage, four SNPs were genotyped using Sequenom assays and two SNPs using TaqMan assays. The SNP rs7000921 survived in the replication stage 2 and then went forward to the subsequent replication stages 3 and 4 using TaqMan assays.
In the Sequenom assays, the sample DNAs were amplified by multiplex PCR, then the products were used for locus-specific single-base extension reaction and detected for alleles using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (Sequenom). TaqMan assays were performed according to the manufacturers' instructions (Applied Biosystems).
In each of the replication stage, the genotype data were subjected to the same quality control analyses as in the GWAS stage. The cluster patterns of the genotyping data from the Sequenom and TaqMan assays were visually checked to confirm their good quality. The associations were carried out with additive model using PLINK (v1.07).
Meta-analysis of data generated from both GWAS and replication stages was conducted to assess the pooled genetic effects using meta-analysis helper (METAL) software 43 . Cochran's Q statistic were calculated to test between-group heterogeneity.
The potential modification effects of sex and age on the association between rs7000921 and persistent HBV infection risk were assessed both by adding interaction terms in the logistic regression model and by separate analyses of subgroups of subjects stratified by these factors.
For detailed descriptions, see Supplementary Note.

Statistical analyses.
Fisher's exact test or w 2 test was used for the analyses of contingency tables depending on the sample sizes. P values were calculated by two-sided Student's t-test for means of age, activities of reporter genes, expression levels of HBV markers and mRNA expression levels of IFNs. The estimate of variation within each group of data was carried out by F-test. If measured values did not meet the assumptions of normality and homogeneity of variances, log-transformation was used before t-tests were performed. Po0.05 was considered as statistical significance. P values for the correlation between the genotypes of rs7000921, rs11991803 and rs4922214, and the mRNA levels of indicated genes in liver tissues (log2 transformed) were determined using the linear regression analyses adjusting for sex and age; those were considered to be significant when below 0.05 after Bonferroni correction by multiplying with the number of comparisons. Spearman's test was used to evaluate the correlation coefficiency (r) and the two-tailed P values of the expression levels of INTS10, p-p65 and p-IRF3, which were non-normally distributed variables. Pearson's test was used to evaluate the correlation coefficiency (r) and the two-tailed P values of the plasma INTS10 and HBV DNA load. Log-transformation was used before Pearson's test. Po0.05 was considered as statistical significance. For detailed descriptions of genotype-expression analyses, functional annotations, pathway enrichment analyses and functional assays (including cell transfections, western blotting assays, detection of HBV DNAs and RNAs, quantitative real-time PCR assays, enzyme-linked immunosorbent assays, luciferase reporter gene assays and immunohistochemistry assays), see Supplementary Note.