Serpin peptidase inhibitor (SERPINB5) haplotypes are associated with susceptibility to hepatocellular carcinoma

Hepatocellular carcinoma (HCC) represents the second leading cause of cancer-related death worldwide. The serpin peptidase inhibitor SERPINB5 is a tumour-suppressor gene that promotes the development of various cancers in humans. However, whether SERPINB5 gene variants play a role in HCC susceptibility remains unknown. In this study, we genotyped 6 SNPs of the SERPINB5 gene in an independent cohort from a replicate population comprising 302 cases and 590 controls. Additionally, patients who had at least one rs2289520 C allele in SERPINB5 tended to exhibit better liver function than patients with genotype GG (Child-Pugh grade A vs. B or C; P = 0.047). Next, haplotype blocks were reconstructed according to the linkage disequilibrium structure of the SERPINB5 gene. A haplotype “C-C-C” (rs17071138 + rs3744941 + rs8089204) in SERPINB5-correlated promoter showed a significant association with an increased HCC risk (AOR = 1.450; P = 0.031). Haplotypes “T-C-A” and “C-C-C” (rs2289519 + rs2289520 + rs1455555) located in the SERPINB5 coding region had a decreased (AOR = 0.744; P = 0.031) and increased (AOR = 1.981; P = 0.001) HCC risk, respectively. Finally, an additional integrated in silico analysis confirmed that these SNPs affected SERPINB5 expression and protein stability, which significantly correlated with tumour expression and subsequently with tumour development and aggressiveness. Taken together, our findings regarding these biomarkers provide a prediction model for risk assessment.

SERPINB5 11 [also known as maspin (mammary serine proteinase inhibitor)] belongs to the serpin superfamily of proteins and has been grouped with the ov-serpin subfamily (clade B) 12 . SERPINB5 is a tumour suppressor that binds directly to extracellular matrix components, suggesting that the surface binding interaction is responsible for the inhibition of tumour-induced angiogenesis, invasion and metastatic spread 13 . The SERPINB5 mRNA and protein are produced in normal mammary epithelial cells. Gene expression is partially down-regulated in primary cell lines, and the loss of SERPINB5 expression is correlated with increasing malignancy in several tumours, such as breast 14 , prostate 15 , thyroid 16 and skin 17 . SERPINB5 is an important senescence-associated marker and prognostic tumour suppressive factor in the complex carcinogenic process 18,19 .
Based on several known serpin structures including SERPINB5, serpins have the capacity to bind co-factors and undergo serine protease-induced conformational changes in which the serpin reactive centre loop (RCL) develops a non-standard hinge region that is able to trap and inhibit serine proteases 20 . Eventually, SERPINB5 adopts the native serpin fold consisting of nine α -helices and three β -sheets. Strand 4A is in close vicinity to the RCL and may play an important role in a covalent bond interaction with the catalytic site of the target protease 21 .
Although the development of HCC may take 20 to 50 years, early detection of this cancer is seldom available due to the lack of reliable markers. Therefore, the disease runs a largely asymptomatic disease course until it is too advanced for successful treatment 22 . Aberrations in some genes may be responsible for certain clinical features of HCC 23 . For instance, differences in the SERPINB5 expression level have been demonstrated between precancerous and malignant lesions 18 . However, the associations between SERPINB5 variants and HCC risk and prognosis have been poorly investigated. Gene expression is affected by SNPs located within the promoter or other regulatory regions of the gene. Thus, six SERPINB5 SNPs located in two linkage disequilibrium (LD) blocks were genotyped to perform a haplotype-based association analysis in a case control study of the Taiwanese population to evaluate HCC susceptibility.

Results
Study Population. A total of 892 participants, including 302 HCC cases and 590 controls were successfully genotyped for further analysis. The demographic characteristics including mean age, gender, alcohol consumption, tobacco consumption and disease stage of are shown in Table 1. No significant differences existed between groups of alcohol consumption (P = 0.809) and tobacco use (P = 0.395) in the healthy controls and patients with HCC. Whereas, the mean age (Control: 51.11 ± 14.97 years; HCC: 63.01 ± 11.78 years; P < 0.001) were significantly lower comparing the matched-free cohort from the general population, female sex (Control: 18.1%; HCC: 29.5%; P < 0.001) was more prevalent (Table 1). Overall, there was an increasing risk of HCC observed with increasing age.

Frequency Distribution of SERPINB5 Alleles and their Associations with HCC. Six SNPs in the
SERPINB5 gene were genotyped in HCC patients and the healthy controls using Hardy-Weinberg equilibrium (P > 0.05). Table 2 summarized the basic characteristics of SERPINB5 SNPs in the study population which indicated the highest distribution frequency for rs17071138, rs3744941, rs8089104, rs2289519, rs2289520 and rs1455555 loci of SERPINB5 gene were T/C, C/T, C/T, T/C, G/C and A/G, respectively, in both HCC patients and healthy control subjects. According to the adjusted odds ratios (AORs) with their 95% confidence interval (CI) with multiple logistic regression model for HCC of SERPINB5 gene polymorphism, only rs2289520 CC or GC + CC presented a significant (P < 0.05) lower risk of 0.247-fold (95% CI, 0.113-0.543) and 0.666-fold (95% CI, 0.477-0.929) have HCC compared with their corresponding wild-type homozygotes after adjusting confounding factors (Table 2). To explore the impact of polymorphic genotype rs2289520 of SERPINB5 on clinic-pathological development of HCC, we further classified the HCC patients and heathy control subjects into two subgroups each: one subgroup with at least on polymorphic alleles (GC + CC) and the other subgroup with homozygous wild type alleles (GG). Patients with at least one polymorphic C allele of SERPINB5 rs2289520 was found to be significantly associated with high Child-Pugh grade B/C [odds ratio (OR) = 0.527, 95% CI, 0.328-0.996, P = 0.047] that significant predictors of poor survival 24 (Table 3). In addition, we also further analyzed the potential between six independent SNP locus of SERPINB5 and the levels of several serum markers such as AFP, aspartate transaminase (AST) and alanine transaminase (ALT) values and AST/ALT ratio. As a consequence, no significantly difference in the serum levels of these markers was detected between patients who possess at least on polymorphic allele and those who do not for any of the SERPINB5 SNPs examined (Table S2).  Table 2. Distribution frequency of SERPINB5 genotypes in 590 controls and 302 patients with HCC. The odds ratios (ORs) and with their 95% confidence intervals (CIs) were estimated by logistic regression models. The adjusted odds ratios (AORs) with their 95% confidence intervals (CIs) were estimated by multiple logistic regression models after controlling for age, gender, and tobacco and alcohol consumption. * P value < 0.05 as statistically significant.
Scientific RepoRts | 6:26605 | DOI: 10.1038/srep26605 Haplotype analysis of the SERPINB5 gene. All of the subjects were genotyped for a total of 6 tag SNPs that were selected to cover (r 2 ≥ 0.70) most of the SNPs located in a 32 kb region, including the SERPINB5 gene (28 kb), its promoter (2 kbp) and the 3′ -untranslated region (3′ -UTR, 2 kbp). SERPINB5 polymorphisms were further characterized using LD and haplotype analyses. LD was determined pairwise among all 45 SNPs and the haplotype structure of SERPINB5 gene was analyzed (D' and r 2 ) according to the 1000 Genomes Project data from the East Asian population (CHB + JPT, Fig. S1) 25 . Haplotype blocks divided by D' confidence interval method, D' value of 95% CI 0.70~0.98 in adjacent SNPs were classified as the same haplotype block. Two LD blocks were detected by Solid Spine 26 of haplotype phasing techniques. Block1 (15 kb) consisted of 3 closely selective SNPs showed strong linkage, rs17071138, rs374491 and rs8089104, in promoter of SERPINB5 (Fig. S1). Block2 (17 kb) included two completely linked selective SNPs, rs2289519 and 2289520. Additional, weak linkage between rs2289520 and rs14555556 was observed in coding region (Table S1). Finally, a haplotype-based association study was performed to show the association between SERPINB5 haplotype and risk of HCC (Table 4). The block1 of promoter SNPs constituted virtually three haplotypes of approximately equal frequencies in control subjects (42.8%, 35.4% and 11.6%), but one haplotype in the HCC cases, "C-C-C", was associated with increased susceptibility to HCC (OR = 1.450; 95% CI, 1.039-2.025, P = 0.029). The expression of the SERPINB5 has been demonstrated to be under the control of the oncogenic transcription factors TP53 27 and TP63 28 . Moreover, the 300 bp promoter sequence containing the HCC-risk-associated haplotype was identified the putative functional role of rs3744941 by the functional annotations in the Encyclopedia of DNA elements (ENCODE) data 29 (Fig. 1). We determined the rs3744941 was situated at a locus with Transcription factor (TF) binding, histone modification patterns, DNA hyposensitivity, and CpG islands that were characterized as promoter or enhancers in several cell type (Fig. 1b). The effect of rs3744941 may be attributed to the suboptimal BACH2 binding site 30 (Fig. 1b) surrounding the upstream of the predicted transcriptional start site of human SERPINB5 gene (Fig. 1b), which enable the modulation of initiation rates in response to the transcriptional status. In addition, the Genome-Tissue Expression (GTEx) database showed a statistically significant down regulation of SERPINB5 mRNA expression in whole blood of rs17071138 variant genotype (TC) compared with wild-type homozygous genotype (TT, P = 0.046, Fig. S2a). A similar result was also found for rs3744941 in the muscle and   Table 3. Adjusted odds ratio (AOR) and 95% confidence interval (CI) of clinical status and SERPINB5 rs2289520 genotypic frequencies in 302 HCC patients with tobacco consumption. The ORs with analyzed by their 95% CIs were estimated by logistic regression models. > T2 indicated the multiple tumor more than 5 cm or tumor involving a major branch of the portal or hepatic vein(s). * P value < 0.05 as statistically significant. † Child-Pugh grades indicate the severity of cirrhosis: A = 5-6 points, B = 7-9 points and C = 10-15 points.
Scientific RepoRts | 6:26605 | DOI: 10.1038/srep26605 skeletal tissue (Fig. S2b). Accordingly, these promoter SNPs reduced BATH2 binding site or at least decreased TF binding affinity in the HCC-risk haplotype with reduction gene expression, and increase the susceptibility to HCC (Table 4).
Multiple alignment of the deduced amino acids sequence revealed that inter-species amino acid conservation from humans thought flog at the 40-residues stretch of SERPINB5. Figure 2a, positions 179 and 187 of SERPINB5 (the positions of the amino acid substitution corresponding to rs2289519 and rs2289520, respectively). As see several amino acids appeared in these positions caused a conserved change from a polar acid to a slightly polar amino acid, with increased hydrophobicity. These variants are positioned within the functional RCL of the SERPINB5 gene (Fig. 2c).
In order to understand how the polymorphism could affect protein structure, we further analyzed the effect of structural adjacent variants bearing three nsSNPs (S179P, V191L and I322V) from the haplotype block2 were developed based on published crystal structures of normal SERPINB5 31 (protein data bank, PDB ID: 1XQG, Fig. 2c). Based on physico-chemical properties, change in free energy score (∆ ∆ G), we evaluate functional consequences of deleterious nsSNPs in RCL domain on the protein stability using Eris server 32 . The nsSNPs rs2289519 (S176P) and rs22289520 (V197L) maps on to loop of s3C β sheet and s4C β sheet of SERPINB5 relatively near the RCL domain is thought to mediate protein activity, respectively (Fig. 2c). Further, from our study, it is clear that the polymorphic amino acids different in size compared to wild-type protein. The notable change of the wild-type buried amino acids were altered and may result in empty space in protein, suggesting that the risk-associated haplotype (C-C-G) correlate to rs2289519-rs2289520-rs1455555 probably alter the catalytic activity of SERPINB5 protein (Fig. 2d).

Discussion
Several studies have suggested that chromosome 18q21 contains a tumor suppressor gene involved in multiple tumor types [33][34][35] . Recent studies have reported that SERPINB5 polymorphism is associated with the susceptibility to several carcinomas including gastric 36 , lung 37 , bladder 38 , colorectal 39 , and breast 14 cancers. We hypothesize that genetic variants of the SERPINB5 may influence clinical outcomes in localized HCC patients. Six SNPs were included in the present case-control study design. One of the SNPs (rs2289520) is located in exon 1 of SERPINB5. Our data reveal an increased risk of HCC among patients with the SERPINB5 polymorphic rs2289520 C/C compared with those with homozygous G/G. Only few studies have examined the functional role of rs2289520 40 , and we present additional evidence for a role of SERPINB5 in HCC, as elevated SERPINB5 gene was associated with more aggressive cancers and poorer clinical outcomes.
Further, two SNP haplotypes located on the SERPINB5 promoter and protein coding regions have been clinically examined probably because a putative BASH2 binding site and RCL domain, typically influences the   alternative translation expression efficiency and protein stability, respectively. Although directly testing this hypothesis was beyond the scope of the current study, evidence suggests that decreased promoter activity and protein stability associated susceptibility to HCC, suggesting that SERPINB5 downregulation is associated with increase susceptibility to HCC. Jang et al. showed the polymorphic variant (rs2289518) associated with the cell  relatively neglected in terms of exploring the underlying pathogenic mechanisms. An improved understanding of these variants is a prerequisite for developing therapeutic approaches that can eventually ameliorate the clinical phenotype in patients harboring the corresponding lesions. Surveillance should be offered to patients with a high risk of developing HCC. Although biomarkers are not widely accepted as important clinical tools, they contribute valuable information for the management of patients with HCC, with regards to surveillance, diagnosis, evaluation of treatment efficacy, and prediction of outcomes Comparing the levels of clinical pathological markers, such as AFP, AST, and ALT, which partially reflects the body function and nutrition status, no significant difference between the wild-type and polymorphic genotypes of each SERPINB5 SNPs in HCC patients was observed (Table S2). HCC is usually diagnosed in cirrhotic patients (60-80%). Nevertheless, significant (P < 0.05) pathological characteristics of high Child-Pugh grade B/C appeared, a significant predictor of poor survival outcome in patients with HCC 42 , in the cirrhotic HCC patients with at least one polymorphic C allele of SERPINB5 rs2289520 genotypes (Table 3). These polymorphic markers may further improve the prediction to evaluate biological status and background liver function.
In conclusion, this study comprises a comprehensive effort in medical information and conducting additional bioinformatics analyses of a high number of patients provided comprehensive evidence of SERPINB5 polymorphism in HCC. Our results suggest that the SERPINB5 polymorphic promoter SNPs and nsSNPs in the SERPINB5 is associated with clinical statuses and susceptibility to HCC. The co-effects of SERPINB5 polymorphism in translational and protein level markedly facilitate HCC development. Overall, our analyses provide deeper insights into naturally occurring haplotype-based variants. Characterizing the molecular basis of mutations in cancer cells provides insight into tumorigenesis and accurate biomarkers on such types of variant are required for developing optimal therapeutic approaches that can eventually ameliorate the clinical phenotype in patients harboring the corresponding lesions.

Materials and Methods
Description of the Enrolled Participants. This hospital-based case control study recruited 302 (213 men and 89 women; mean age = 63.01 ± 11.78 years) HCC patients between 2007 and 2012 at the Chung Shan Medical University Hospital, Taiwan. The HCC diagnosis was based on the criteria specified in the national guidelines for HCC. Specifically, liver tumours were diagnosed by histology or cytology irrespective of the α -fetoprotein (AFP) titre after computed tomography or magnetic resonance imaging data showed at least one of the following: (1) at least one liver mass ≥ 2 cm in diameter; (2) early enhancement and AFP levels ≥ 400 ng/ml; or (3) early arterial phase-contrast enhancement plus early venous phase-contrast washout regardless of the AFP level. During the same study period, the 590 ethnic group-matched individuals (483 men and 107 women; mean age = 50.11 ± 14.97 years) were enrolled as these subjects received a physical examination at the same hospital. These control groups had neither self-reported history of cancer of any sites. Personal information and characteristics collected from the study subjects using interviewer-administered questionnaires contained questions involving demographic characteristics and the status of cigarette smoking and alcohol drinking. HCC patients were clinically staged at the time of diagnosis according to the tumour, node and metastasis (TNM) staging system of the American Joint Committee on Cancer 43 . Liver cirrhosis was diagnosed by liver biopsy, abdominal sonography, or biochemical evidence of liver parenchymal damage with endoscopic oesophageal or gastric varices. The patients' clinicopathological characteristics, including clinical staging, tumour size, lymph node metastasis, distant metastasis, reactivity with an antibody against HCV (anti-HCV), liver cirrhosis, AFP, AST and ALT levels, were verified by chart review. Whole blood specimens collected from the controls and HCC patients were placed in tubes containing ethylenediaminetetraacetic acid (EDTA) and then immediately centrifuged and stored at − 80 °C. Before commencing the study, approval was obtained from the Institutional Review Board of Chung Shan Medical University Hospital, and informed written consent was obtained from each individual (CSMUH No:CS15099). All participants provided written consent, and the Chung-Shan Medical University Hospital ethics committees approved the research protocol and informed consent was obtained from all subjects. All the methods applied in the study were carried out in accordance with the approved guidelines. SNP Selection and Genotyping. Genomic DNA was isolated from the peripheral blood using the QIAamp DNA blood mini kit (Qiagen, Valencia, CA, USA). The final preparation was stored at − 20 °C, quantified by measurement of the optical density at 260 nm and used as the polymerase chain reaction (PCR) template. Genotyping of 6 SERPINB5 SNPs (rs17071138, rs3744941, rs8089104, rs2289519, rs2289520, and rs1455555; Fig. 1) with minor allele frequencies > 5% in the HapMap Chinese Han Beijing (CHB) population was performed by the TaqMan SNP genotyping assay (Applied Biosystems, Foster City, CA, USA) 44 . A total of six SNPs included promoter region SNPs, rs17071138 and rs3744941; intron region, rs8089104, non-synonymous SNPs, rs2289519, rs2289520, and rs1455555 of the gene. SERPINB5 rs17071138, (assay IDs: C_33627662_20), rs3744941 (assay IDs: C_27493638_10), rs8089104 (assay IDs: C_29202434_30), rs2289519 (assay IDs: C_22274204_10), rs2289520 (assay IDs: C_22274205_10), and rs1455555 (assay IDs: C_8932279_10) polymorphisms were assessed using an ABI StepOnePlus TM Real-Time PCR System and analyzed using SDS v3.0 software (Applied Biosystems, Foster City, CA). A genotyping fluorescence-based TaqMan SNP assay was demonstrated to be suitable for the analysis 45 . The final volume for each reaction was 10 μ L and contained 5 μ L of the TaqMan Universal PCR Master Mix, 0.25 μ L of the primer/TaqMan probe mix, and 10 ng of genomic DNA. The real-time PCR reaction consisted of an initial denaturation step at 95 °C for 10 minutes followed by 40 cycles consisting of 92 °C for 15 sec. and 60 °C for 1 min. The fluorescence level was measured with the Applied Biosystems StepOne Real-Time PCR System (Applied Biosystems). Allele frequencies were determined by the ABI SDS software. Genotyping was repeated on a random 10% of the sample to confirm the results of the original run. For each assay, appropriate controls (non-template and known genotype) were included in each typing run to monitor reagent contamination and as Scientific RepoRts | 6:26605 | DOI: 10.1038/srep26605 a quality control. To validate the real-time PCR results, approximately 5% of the assays were repeated and several cases of each genotype were confirmed by DNA sequencing analysis.
Bioinformatics Analysis. Several semi-automated bioinformatics tools to assess whether SNPs or their linked genetic variants were associated with a putative function that might affect patient outcomes. HaploReg 46 v4 and the Genotype-Tissue Expression (GTEx) database 47 from the Encyclopedia of DNA Elements (ENCODE) project 48 were used to identify the regulatory potential on candidate functional variants to examine the particular tracks of interest, such as TF-ChIP signals, DNase peaks, DNase footprints and predicted DNA sequence motifs for TFs. The GTEx data were used to identify the correlations between SNPs and whole-blood-specific gene expression levels. The publically available cBioPortal for Cancer Genomics 49 and UCSC Cancer Genomics Browser 50 for hepatocellular adenocarcinomas were utilized to analyse SERPINB5 gene expression, DNA methylation, molecular features, and clinical outcomes Sequence alignment and Protein structure. A multiple sequence alignment was generated using the CLUSTALX package with a standard point accepted mutation series protein weight matrix. Five SERPINB5 orthologous protein sequences were obtained from the NCBI gene database, and the key residues were identified based on the alignment as previously described for the RCL domain and secondary structure, respectively 21 . The 3D structural model of human SERPINB5 (PDB ID 1XQG) was downloaded from the RCSB PDB database 51 .
Statistical analysis. The Hardy-Weinberg equilibrium was assessed using a Chi-square goodness-of-fit test for bi-allelic markers. The Mann-Whitney U test and Fisher's exact test were used to compare differences in the distribution of age and demographic characteristics between the controls and HCC patients. ORs with 95% confidence intervals (CIs) were estimated using logistic regression models. AORs with 95% CIs were used to assess the association between genotype frequencies with HCC risk and clinical factors. P values less than 0.05 were considered significant. The data were analysed with the SPSS 12.0 statistical software (SPSS Inc., Chicago, IL, USA). Linkage disequilibrium coefficients [D' = D/D max (or D/D min if the D' value was negative)] were assessed for pairs of alleles between the two sites of SERPINB5 polymorphisms, and haplotype blocks were defined using the default setting of the Haploview software 26 . We estimated the common haplotypes with PHASE version 2.1. A likelihood ratio test was used to perform a global test of association between all haplotypes and HCC occurrence.