Comparison of the Characteristics of Hepatitis B Virus Whole Genomes Derived From Patients With Hepatocellular Carcinoma and Chronic Hepatitis B Infection

Background: Hepatocellular carcinoma (HCC) caused by hepatitis B virus (HBV) infection is a serious public health problem in China. The aims of the present study was to report HCC prevalence in China and characterize the whole gene sequences of HBV derived from patients diagnosed with HCC as well as those with chronic hepatitis B (CHB) infections. Methods: Patients in the HCC group and the CHB group were recruited from national HBV surveillance sites, which were matched by age, gender, and region. All serum samples were tested for serological markers. Polymerase chain reaction (PCR) was used to amplify the HBV complete genome sequences. Then the analysis of the full-length HBV genomes was performed with bioinformatics and statistical software. Results: Serum samples were collected from 51 patients with HBV-related HCC and 76 patients with CHB. All patients were from six provinces (Guangxi, Hebei, Henan, Hunan, Qinghai, and Shanghai) in China. Sequencing and analysis of the full-length HBV genomes revealed the presence of four genotypes (B, C, CD, and I). The distribution of HBV genotypes and the positivity rate of the hepatitis B e antigen differed between the two groups. A total of 148 substitution sites deemed statistically signicant were identied between the HCC and CHB groups. In addition, three mutational sites associated with HCC, (F22I/L/P in the pre-S2 region, P33S/T and S144A/T/V in the X region) were identied. Deletions to the pre-S and X regions were found in both HCC and CHB patients. However, deletions to the X region were more common in the HCC group than the CHB group. Conclusions: In this study the hotspot mutations associated with high risk of HCC mostly occurred in the sequences and some substitutions (C1470A/T, T1803A/G, and C1804T) that have not been previously reported. It was implicated that aa33 and aa144 substitution in X region may be new predictive markers for HCC. The results of our study would


Background
Hepatitis B Virus (HBV) is considered to be a major global health problem with more than 250 million chronic HBV (CHB) carriers and more than one million HBV-associated human deaths per year [1]. Hepatocellular carcinoma (HCC) is the most common malignancy of the liver. In 2018, HCC was reported as the fourth leading cause of cancer-related death worldwide, with more than 841,000 new diagnoses and 782,000 deaths globally per year [2]. HCC is a serious worldwide health burden due to the high mortality and low resectability rates [3]. Among all the risk factors, HBV infection is the most important one for HCC, accounting for approximately 80% of all cases [4].
Immunization against HBV was implemented in China 30 years ago and was o cially integrated into the China National Immunization Plan in 2002. Nonetheless, there are approximately 93 million people currently infected with HBV in China, including about 28 million chronic HBV cases, that are at increased risk for the development of HCC [5].
Various types of mutations to the HBV genome have been reported, resulting amino acid substitutions during long-term infection, some of which could serve as markers to predict the development of HBV-associated HCC [6,7], such as mutations in the pre-surface antigen (pre-S) (i.e., deletion to the preS1 [W4P/R] and pre-S2 start codon deletion, pre-core (i.e., nucleotide 1896(G1896A), or a double mutation to the basal core promoter (BCP) and X regions (V5M) [8][9][10].
In order to investigate the effects of various HBV mutations on the development of HCC, this cases-control study will report current HCC prevalence in reprentative regions and analyze the complete HBV genomes isolated from HCC and CHB patients.

Field background
Patients with HCC in this study were recruited from six hepatitis B surveillance sites in China (Guangxi, Hebei, Henan, Hunan, Qinghai, and Shanghai), where the incidence of HBV infection were surveillanced since the 1980s. Patients with HCC were diagnosed in provincial hospitals or tumor specialist hospitals. Basic information was recorded on the questionnaire prepared beforehand, including name, gender, birth date, address, ethnicity, immunization history and medical information, and 5 mL of venous blood was taken from each participant. All the HCC subjects were collected between January, 2012 and December, 2015.

Serological markers
All sera samples were tested for three HBV markers (HBsAg, anti-HBe, and HBeAg), as well as anti-HCV antibodies and HDV-IgG. HBV markers were detected with the use of MEIA kits and an ARCHITECT i2000SR immunoassay analyzer (Abbott Diagnostics, Chicago, IL, USA). Other markers were detected using enzyme linked immune-sorbent assay (Beijing Wantai Biological Pharmacy Enterprise Co., Ltd., Beijing, China).
Ampli cation and sequencing of the complete HBV genome from HCC patients and CHB patients First, representative HCC patients were selected for whole genome sequencing as case group. CHB patients who were matched case group by age, sex and region were selected as control group. These sera were all positive for Hepatitis B surface antigen (HBsAg). Patients in the CHB group were HBsAg-positive for more than 6 months.
In sum, one hundred and twenty-seven patients (51 in the HCC group and 76 in the CHB group) were selected to obtain complete genome sequences. HBV DNA was extracted from 200µL of sera using a QIAamp DNA blood mini-kit (Qiagen, Hilden, Germany) in accordance with the manufacturer's instructions. The full-length HBV genome was divided into two parts, which were ampli ed using the nested polymerase chain reaction (PCR) method as we previously described [11]. The wholegenome nucleotide sequences of the 127 HBV isolates reported in this article have been deposited in the National Center for Biotechnology Information GenBank database under accession numbers MT644995 to MT645072.
Genotyping of HBV HBV genotypes were compared with a set of 88 reference sequences (genotype A-I) retrieved from the GenBank database [12]. Phylogenetic trees were reconstructed by the maximum likelihood (ML) method implemented in IQ-tree package under the GTR+I+G nucleotide substitution model, which was selected by ModelFinder [13,14]. Support for the ML tree was inferred by bootstrapping with 1000 replicates. To detect recombination in genome sequences of HBV, sequences were investigated using SimPlot v3.5.1 software and JPHMM (jumping pro le Hidden Markov Model) method [15,16] .

Nucleotide and amino acid mutation analysis
The substitution proportion of every nucleotide site was calculated for all of the 127 HBV whole genome sequences. First, map the nucleotide site frequency distribution of 127 nucleotides for every site among 3215 nucleotide sites of HBV genome. The nucleotide type with the largest proportion were taken as the prototype, other types of nucleotide in this position were de ned as a substitution. If there was a deletion for the nucleotide site, it was not included in the calculation. Second, every nucleotide site substitution proportion was calculated for the HCC group and the CHB group respectively. Subsequently, the sites with the nucleotide substitution rate which was more than 5% in either HCC group or CHB group compared between the two groups. The bar chart for the sites with the signi cant difference for nucleotide substitution rate was drawn.
In previous reports [17,18], some amino acid substitutions in the pre-S, pre-C/C, and X region was proved associated with HCC. In present study, all of these nucleotide sites were deduced into corresponding amino acid sites within the pre-S, pre-C/C and X regions. And then the amino acid substitution rate was calculated for the HCC group and the CHB group respectively. At last, the amino acid substitution rate was compared between the two groups.
The sequences with pre-S deletion or X region deletion were aligned with the same genotype reference HBV sequences. All of the Nucleotide sequences of datasets in this study and reference HBV sequences were analyzed using the MEGA 7.0 software.

Statistical analysis
Pearson's χ 2 -test or Fisher's exact test were used for analysis, as appropriate. All statistical tests were two-tailed and a probability (p) value of <0.05 was considered statistically signi cant. All analyses were using IBM SPSS Statistics for Windows, version 22.0 (IBM Corp., Armonk, NY, USA).

Ethical approval
The study protocol was approved by the Ethics Committee of the Chinese Center for Disease Control and Prevention and conducted in accordance with the ethical guidelines of the 1975 Declaration of Helsinki. The purpose of the study and the right to information were explained to the participants by the research staff. Written informed consent was obtained from each participant before the interview and venous blood collection.

Baseline features of the study population
In total, there were 3262 HCC patients detected during 2012 to 2015 in six surveillance regions. The people with age >40 have signi cant higher HCC prevalence than age<40 (P=0.000). Of all the six regions, Guangxi, which located in southwest of China, was with the highest HCC prevalence. The detail of HCC patients in six regions were listed in Table 1.
From 2012 to 2015, the average age-speci c prevalence rate of HCC in the surveillance regions ranged from 0.07/100000 to 31.29/100000 ( Table 1). The prevalence of HCC in population < 70 years old signi cantly increased with age (Trend χ 2 = 4451.26, P = 0.000).
A total of 127 whole genome sequences of HBV DNA were successfully ampli ed from serum samples of 51 patients with HCC and 76 with CHB. Of these 127 serum samples, no anti-HCV antibody or HDV-IgG was detected and none of these patients was alcoholism.
The gender, patient ages, regions, HBV genotypes, and HBeAg status of the 51 HCC and 76 CHB patients are shown in Table   2. There was no obvious difference in sex, patient ages, and regions between the two groups. The HBeAg-positive rate was signi cantly higher in the CHB group than in the HCC group. Of the 51 patients in the HCC group, two (3.92%) were positive for the HBV/B genotype, 43 (84.31%) for the HBV/C genotype, and six (11.76%) for other genotypes (HBV/CD, HBV/I). Of the 76 patients in the CHB group, 14 (18.42%) were positive for the HBV/B genotype, 55 (72.37%) for the HBV/C genotype, and seven (9.21%) for other genotypes (HBV/CD, HBV/I). The HBV/C genotype was predominant in both groups. About the proportions of genotypes, there was no signi cant difference between the two groups (p > 0.05; Table 1). The geographical location and genotype distribution were shown in Fig 1. Comparison of HBV DNA nucleotide substitutions between the HCC and CHB groups Nucleotide substitutions of the complete genome with frequencies of >5% derived from HCC and CHB patients were compared and those with signi cant differences (p < 0.05) are visualized in Fig 2. There were 148 (4.60%) sites with signi cant differences between the two groups. The rates of 53 substitution sites, which were signi cantly greater in the HCC group than the CHB group, were mainly located in the C-terminus of HBx (T1802C, T1803A/G, C1804T, G1896A, G1899A, C1969T) and C regions (T2263C, A2269G, T2278C/G, C2281T, T2284A/C, T2287C, etc.) C region (T53A/C, C502A/G, A929T, G1896A, G1899A, etc.), while the rates of 95 substitution sites, which were signi cantly greater in the CHB group than the HCC group, were mainly located in the pre-S, S, X and P regions.
Comparative analysis of HCC-related nucleotide and amino acid substitutions The deduced amino acid corresponding to the nucleotide sites in the pre-S, pre-C/C, and X regions were analyzed. The genotype-speci c amino acid substitutions were removed. The amino acid substitution rates with signi cant differences between the HCC and CHB groups are shown in Table 3. All the nucleotide sites shown in Table 3 were associated to HCC, as previously reported [10,19,20].The substitution rates of three sites (aa22, aa33, and aa144) were greater in the HCC group than the CHB group (47.83% vs. 28.38%, 29.41% vs. 9.21%, and 28.57% vs. 9.46%, respectively). The amino acid substitutions at position aa22 were located in the pre-S2, while those at positions aa33 and aa144 were located in the X region, distributed among different genotypes.
Association of different pre-S deletion types with HCC As there were few sequences in the genotype B I and CD subgroup, only genotype C were analyzed. Most of the pre-S deletion mutations were in the genotype C subgroup, pre-S1 deletions were more frequent in the CHB group (11.63% vs. 25.45%), while pre-S2 deletions were more frequent in the HCC group (25.58% vs. 20.00%). Epitope mapping revealed frequent deletions in ve epitopes. Deletions to the sequences in pS1-B1, pS1-B2, pS2-B2, pS2-B3, and pS1-T1 were more common in the HCC group than the CHB group. Details were shown in Table 4.
Functional mapping revealed that the frequencies of deletions in three functional domains (CBF, NBS, and pHSA) were higher in the HCC group than the CHB group, in contrast to other domains (L start codon, HBS, S promoter, HSC70, CAD, M start codon, and VS start codon) which have no signi cant differences between two groups. As shown in Table 4, the frequency of HBS deletion was signi cantly higher in the CHB group than the HCC group (p < 0.05). Notably, there were no deletion mutants to the functional domains of the L and HBS start codons in the HCC group.
Characteristics of X region deletions The 16.5-kDa HBx protein is composed of 155 amino acids encoded by 465 base pairs (nt 1374-1838). A map of deletions to the HBx protein region of the CHB and HCC groups is shown in Fig. 3. C-terminal deletions are among the most frequently reported mutations to HBx and were detected in four and seven patients in the CHB and HCC groups, respectively. Most patients were genotype C, with the exception of one with genotype CD (the X ORF were in the genotype C fragment). The deletion rate was higher in the HCC group than the CHB group (13.73% vs. 5.26%, respectively), although this difference was not signi cant. All four deletions of the CHB group were located in the C terminus of HBx (aa 126-135). Meanwhile, there was a larger range of the C-terminus deletions for seven patients in the HCC group, two were truncation to the C-terminus of HBx, and the other four were concentrated at the C-terminus of HBx (aa104-154).

Discussion
The etiology and pathogenesis of HCC is complex, due to many related risk factors, such as hepatitis B virus (HBV) infection, a atoxin exposure, as well as physical and chemical factors, especially alcoholism and other unhealthy lifestyle habits. HBV can promote carcinogenesis though chromosomal instability, numerous mutations, and the interaction between HBx protein and host proteins [21]. Besides, the indirect effects of HBV infection include chronic in ammation and oxidative stress, which can subsequently lead to varying degrees of hepatic injury [22]. Chronic hepatitis infection (CHB) is a strong risk factor for the development of HCC mostly due to HBV nucleotide level, HBV genetic mutations, positivity for the hepatitis B e antigen (HBeAg), HBV genotypes, and co-infection with hepatitis C virus [23].
The six sampling and surveillance regions selected in this study were HepB long-term immunization regions. Previous HBV epidemiological surveys proved that Guangxi and Qinghai were in high HBV prevalence; Hunan and Henan were in the middle.
Hebei and Shanghai were low HBV endemic regions [24]. So these six surveillance sites represent high, medium and low level of HBV infection regions in China. In this study, the prevalence of HCC was much higher in Guangxi and Hunan than in other regions, and it could be presumed that the higher local HBsAg positivity rate was the main cause. However, Qinghai is also a highly endemic area of HBV, in which more than 95% of the resident population is Tibetan. It was reported that HBV infection rates of Tibetans were signi cantly higher than any other region of China [25]. However, in this study, the HCC prevalence is not as high as Guangxi and Hunan. This result may be due to the main prevalent genotype in Qinghai-Tibet plateau were CD recombinant [11]. It is reported that the incidence of HCC in HepB vaccinated population group was signi cantly lower than that in non-vaccinated population control group, and the risk of HCC was reduced by 84% [26] . Therefore, in the high prevalence areas of HCC HBV infection, especially in the members with family history of hepatitis B, it is very important to surveillance the status and progress of liver disease for early diagnosis of HCC [27].
Integration of the HBV genome is currently believed to be an early event in HBV chronic infection. Notably, mutations and deletions to the HBV genome are associated with an increased risk for the development of HCC and the clinical severity of other hepatic diseases [28]. Most mutations to the HBV genome are generated due to the lack of proofreading capacity of HBV polymerase or host immune pressure [19].
In the present study, HBV complete genome sequences were obtained from samples of 51 HCC patients and 76 CHB patients. To determine whether there was any difference in HBV sequences between the HCC and CHB groups, the nucleotide substitutions of HBV whole genome sequences were compared in four overlapping ORF.
The substitution rates of the nucleotide sites located in the pre-C and C regions were mostly higher in the HCC group than the CHB group. Most pre-C/C mutations are generated during HBeAg seroconversion. Several types of HBV pre-C/C mutations, such as G1896A, A1762T, and G1764A, were reportedly related to disease severity [9,19,29]. Also, the HBeAg positive rate was signi cantly lower in the HCC group than the CHB group. However, it remains unclear whether this phenomenon is associated with the nucleotide substitution to C gene, thus further studies are warranted.
It has been reported that the nucleotide substitution rate in HCC tends to be greater in the X and pre-C/C regions. Most substitution mutations, such as G1653 and G1896, have been associated with an increased risk for the development of HCC [10,19]. However, the impact of these substitutions on disease severity remains unclear, thus further analysis is needed [10,19]. In the present study, the substitution rates of many sites were proved greater in the CHB group than the HCC group. Though there were lots of studies about nucleotide of HCC [10,19].However, substitutions identi ed in this study of 29 sites have been rarely reported previously. Nonetheless, the rates of substitutions to 13 of these sites were signi cantly greater in the HCC group than the CHB group (Fig. 1). Among them, nt1470 and other six sites were located at the B cell epitope. The sites nt1726 and nt1730 were located at the T cell epitope. Previous study reported almost 40% of the integrated HBV genomes were cleaved at approximately nt1800 [10]. Therefore, the sites (nt1799, nt1802, nt1803, and nt1804) may play a potential role in HBV genome integration for HCC development.
In this study, the genotype-speci c amino acid substitution rates, as deduced from the nucleotide sites in Figure 1, were compared between the HCC and CHB groups. The mutation rates of F22I/L/P in the pre-S2 region, as well as P33S/T and S144A/T/V in the X region, were signi cantly higher in HCC group. F22I/L/P is reportedly associated with immune nonreactivity [30,31]. P33S/T and S144A/T/V occurring in the X region have not been previously reported and, thus, are novel mutations possibly associated with a greater risk for the development of HCC. The amino acid at position aa33 was located in the negative regulation domain of HBx (aa 1-50) which formed a B cell epitope (aa 29-48). The HBx region partially overlaps with the RNase H part of HBV polymerase at the C-terminus, and also contains several critical cis-elements. Genetic alterations in this region may not only affect the reading frame of HBx, but also the overlapping cis-elements and the possible binding a nities of this protein to its targets [32]. The amino acid at position aa144 is located in the core promoter of the Cterminus of HBx, which plays a key role in controlling cell proliferation, viability, and transformation [33][34][35].
Deletions to the pre-S and X regions are reportedly associated with the development of HCC [8,36,37]. The pre-S1 and pre-S2 regions contain several epitopes of T or B cells and play essential roles in the immune response [38,39]. Pre-S deletion decreases the expression of the surface proteins of HBV, resulting in intracellular accumulation of HBV envelope proteins and viral particles, which induce endoplasmic reticulum stress and oxidative DNA damage, eventually leading to the development of HCC [40,41].Truncated pre-S2/S sequences are often found in HBV DNA integration sites of HCC patients, and truncated pre-S2/S proteins could speci cally activate the MAPK signaling pathway to activate transcription factors such as AP-1 and NF-κB, and thus promote abnormal proliferation of liver cells [42]. In the present study, there was no signi cant difference in deletions pre-S1 and pre-S2 between HCC and CHB groups, while the frequency of pre-S2 deletions was higher than pre-S1 deletions among HCC patients, which was consistent with the ndings of other studies [43][44][45].
Deletions to the C-terminus of HBx are frequently reported in HCC patients [46,47]. Deletions or insertions to the C-terminus of HBx reportedly impair transactivation activity, thereby inhibiting cell proliferation, which may contribute to the development of HCC [48]. In the present study, all 11 deletions (ten for genotype C and one for genotype CD) were located in the C-terminus of HBx. In the CHB patients, there were deletions to codons 125 to 136 of HBx, but involved more codons in HCC patients especially truncations to the C-terminus of HBx. Previous studies have frequently reported deletions to the 3'-end of the X gene, which leads to truncations of the HBx C-terminus [36]. Reportedly, truncations of the HBx C-terminus occur in nearly 80% of HCC tissues, which may contribute to hepatocarcinogenesis via loss of pro-apoptotic capability of full genes, activation of cell transformation, and subsequent tumor promotion [49]. In the present study, the HBx deletion rate was 13.73% (7/51) in HCC patients, which is lower than previous reports, but higher than in CHB patients 5.26% (4/76).
There were some limitations in this study should be acknowledged. First, there was no signi cant difference in most of the substitution rates between the HCC and CHB groups. Second, as large fragments of deletion in the genome of HBV is the characteristic of HCC samples the chance of obtaining full sequencing maybe insu cient. To better understand the clinical relevance of HBV gene substitutions, further prospective investigations of HBV-infected patients are required.

Conclusion
In this study, a total of 127 full sequences were isolated from HCC and CHB patients. The hotspot mutations associated with an increased risk of HCC mostly occurred in these sequences and included some substitutions that have not been reported previously. Our results implicate P33S/T and S144A/T/V in X region as new predictive markers for HCC. A larger range of the C-terminus deletions is common occurring HCC patients. For the control and prevention of HCC in China, further managements are required in the high HBV prevalence population to reduce the progression to poor clinical outcomes. The study protocol was approved by the Ethics Committee of the Chinese Center for Disease Control and Prevention and conducted in accordance with the ethical guidelines of the 1975 Declaration of Helsinki. The purpose of the study and the right to information were explained to the participants by the research staff. Written informed consent was obtained from each participant before the interview and venous blood collection.

Availability of data and materials
The datasets generated in the current study are based on complete genome sequences which are uploaded by the author in the National Center for Biotechnology Information GenBank database with accession numbers MT644995 to MT645072.