Genetic variants of cell cycle pathway genes predict disease‐free survival of hepatocellular carcinoma

Abstract Disruption of the cell cycle pathway has previously been related to development of human cancers. However, associations between genetic variants of cell cycle pathway genes and prognosis of hepatocellular carcinoma (HCC) remain largely unknown. In this study, we evaluated the associations between 24 potential functional single nucleotide polymorphisms (SNPs) of 16 main cell cycle pathway genes and disease‐free survival (DFS) of 271 HCC patients who had undergone radical surgery resection. We identified two SNPs, i.e., SMAD3 rs11556090 A>G and RBL2 rs3929G>C, that were independently predictive of DFS in an additive genetic model with false‐positive report probability (FPRP) <0.2. The SMAD3 rs11556090G allele was associated with a poorer DFS, compared with the A allele [hazard ratio (HR) = 1.46, 95% confidential interval (95% CI) = 1.13–1.89, P = 0.004]; while the RBL2 rs3929 C allele was associated with a superior DFS, compared with the G allele (HR = 0.74, 95% CI = 0.57–0.96, P = 0.023). Additionally, patients with an increasing number of unfavorable genotypes (NUGs) of these loci had a significant shorter DFS (P trend = 0.0001). Further analysis using receiver operating characteristic (ROC) curves showed that the model including the NUGs and known prognostic clinical variables demonstrated a significant improvement in predicting the 1‐year DFS (P = 0.011). Moreover, the RBL2 rs3929 C allele was significantly associated with increased mRNA expression levels of RBL2 in liver tissue (P = 1.8 × 10−7) and the whole blood (P = 3.9 × 10−14). Our data demonstrated an independent or a joint effect of SMAD3 rs11556090 and RBL2 rs3929 in the cell cycle pathway on DFS of HCC, which need to be validated by large cohort and biological studies.


Introduction
Hepatocellular carcinoma (HCC) is one of the most frequently diagnosed cancers and the leading causes of cancer death worldwide. As a high incidence area of HCC, China accounts for about 50% of the total number of cases and deaths worldwide [1,2]. In 2015, it had an estimated of 466,100 and 422,100 patients is diagnosed and die of HCC in China,

ORIGINAL RESEARCH
Genetic variants of cell cycle pathway genes predict disease-free survival of hepatocellular carcinoma respectively [1], which makes HCC a major public health concern in China. For the past decade, cancer staging systems, such as the Barcelona Clinic Liver Cancer (BCLC) staging system and Cancer of the Liver Italian Program (CLIP) score, have been established for the prognosis assessment by using clinical variables mainly including tumor size, number of primary tumors, portal vein tumor thrombosis, and inflammatory degree [3,4]. But none of these systems has been universally adopted by clinicians. Many novel molecular biomarkers, such as Alpha-fetoprotein, circulating tumor cells and circulating microRNAs, have been also applied to predict HCC progression [5]. However, limitations of low sensitivity and specificity are still associated with these biomarkers when used in disease staging [5]. Long-term outcome prediction for HCC patients remains unfavorable [6,7], which hampers the development of personalized clinical assessment for HCC patients and calls for identifying additional, more discriminant prognostic indicators.
In recent years, the development of human genome research has been witnessed much success in mining cancerrelated germline genetic variants, especially single nucleotide polymorphisms (SNPs). The current advance in genomewide association studies (GWASs) has led to the identification of several SNPs that are associated with HCC risk [8][9][10][11]. Additionally, a series of studies also reported that the HCC patients survival was associated with SNPs in genes functioning in pivotal biological pathways or processes, such as cytokine genes [12,13], telomere maintenance genes [14], DNA repair genes [15], Wnt/β-catenin pathway genes [16]. These pieces of evidence demonstrate that genetic variants act as indicators of different processes of HCC and may provide further information beyond current clinical staging and pathologic prognostic assessments.
The cell cycle is a set of organized and monitored events responsible for proper cell division into two daughter cells, which includes four traditional subdivision phases (i.e., G1, S, G2 and M) and is controlled by the signaling pathway comprising genes encoding cyclins, cyclin-dependent kinases, and cyclin kinase inhibitors as well as the related regulators (e.g., MYC) [17][18][19]. Disruption of the cell cycle pathway can result in cell cycle arrest and has previously been related to prognosis of human cancers [20,21]. For example, CDKN2A and RB1 expression levels have been associated with survival of patients with advanced-stage ovarian cancer [22]. In addition, several studies have also demonstrated that aberrant expression of cell cycle pathway genes, including cyclin (A, D1 and E), CDC2, p27 and p21, is associated with prognosis for patients with HCC [23][24][25][26][27].
We have previously investigated the association of genetic variants in cell cycle pathway genes with susceptibility to HCC and found that SNPs in MCM4, CHEK1 and KAT2B were associated with HCC risk [28]. There are additional studies demonstrating the associations between genetic variants of cell cycle pathway genes and survival of cancers of the ovaries, oral cavity and lung [29][30][31]. However, the associations between genetic variants of cell cycle pathway genes and prognosis of HCC remain largely unknown. In this study, we hypothesized that genetic variants in cell cycle pathway genes are associated with survival of HCC patients. To test this hypothesis, we evaluated the associations of potential functional SNPs in 16 main cell cycle pathway genes (CDC25C, CDC7, CDKN1A, CDKN2A, CHEK1, MCM4,  MCM7, MYC, ORC6L, KAT2B, PLK1, RAD21, RBL2, SMAD3,  TGFB3, and YWHAB) with clinical outcomes of 271 HCC cases who had undergone radical surgery resection.

Study population
The patients were derived from our previously published case-control study [28]. A total of 1127 patients with newly diagnosed and pathologically confirmed HCC were recruited consecutively between June 2007 and December 2013 from the First Affiliated Hospital and Affiliated Tumor Hospital of Guangxi Medical University. Among these patients, 271 undergone radical surgery resection with completed clinical information and long-term follow-up data for survival analysis. Clinical information (including age at diagnosis, sex, ethnicity, hepatitis B virus infection, smoking status, drinking status, capsule, portal vein tumor thrombosis, cirrhosis, and BCLC staging) was collected from medical records. Peripheral venous blood was collected in a vacuum EDTA anticoagulant tube from each participant. Genomic DNA was extracted using a standard phenol/chloroform extraction method and stored at −80°C. The study protocol was approved by the ethical committee of Guangxi Medical University, and a written informed consent was obtained from each of the participants.

Follow-up
All of the patients were followed up to collect data on postoperative recurrence, distant metastasis and death condition through phone calls or the medical records. The endpoint was disease-free survival (DFS). DFS was defined as the interval between the date of tumor resection and the date of the presence of relapse/metastasis/death or until the last follow-up. Patients who were lost to follow-up, who were alive at the last follow-up and who did not have developed recurrence/metastasis were censored.

SNP selection and genotyping
The methods of SNP selection and genotyping have been described previously [28]. In brief, we found 3826 differential expressed genes by analysis of three HCCrelated genome expression microarrays (GSE14520, GSE25097, and GSE12941) from GEO database. After performing the gene ontology classification and pathway enrichment analysis by using blast2GO and DAVID (https://david.ncifcrf.gov/), we identified 40 cell cycle pathway genes involved in the cellular process. Considering the literature on these genes with cancer survival, we selected 16 genes (CDC25C, CDC7, CDKN1A, CDKN2A, CHEK1, MCM4, MCM7, MYC, ORC6L, KAT2B, PLK1, RAD21, RBL2, SMAD3, TGFB3, and YWHAB) for further analysis. The genotype information of the 16 genes were downloaded from Hapmap website (http://hapmap.ncbi. nlm.nih.gov/), and functional SNPs were selected using SNPinfo (https://snpinfo.niehs.nih.gov/) [32]. Considering the minor allele frequency (MAF ≥ 0.05), linkage disequilibrium (LD, r 2 > 0.8) and potential functions in transcription factor-binding, microRNA-binding and splicing sites, we finally included 24 SNPs with putative functions in this study (Table 1). Then, we performed the genotyping by using a MassARRAY system (Sequenom, San Diego, CA, USA) and a matrix-assisted laser desorption ionization-time of flight mass spectrometry method according to the manufacturer's instructions. All the primers for PCR were designed using the Assay Designer software package of the Sequenom system (San Diego, CA, USA).

Statistical methods
Cox proportional hazards regression analysis was conducted to assess the associations between SNPs and DRS in an additive model with adjustment for age at diagnosis, sex, ethnic, hepatitis B virus infection, smoking status, drinking status, capsule, portal vein tumor thrombosis, cirrhosis and BCLC staging. The false-positive report probability (FPRP) approach was applied to correct for multiple comparisons [33]. We assigned a prior probability of 0.1 to detect an HR of 2.0 for the adverse genotypes and alleles of SNPs with an elevated risk. Only the significant results with FPRP <0.2 were considered noteworthy. The stepwise Cox model including noteworthy SNPs and clinical variables was conducted to choose independent predictive SNPs. Kaplan-Meier estimation of survival functions and Log-rank tests were used to evaluate the single and combined effects of risk genotypes on DFS. The ROC curve was used to evaluate the performance of different prediction models, and Delong's test was conducted to compare the area under the curves (AUC) across different models [34]. Finally, to assess the correlation between genetic variants and mRNA expression of the corresponding genes, the expression quantitative trait loci (eQTL) analysis was also performed using data from the GTEx project (http://www.gtexportal.org/home/) and the HapMap3 Project (Release version 2) by general linear regression model in additive genetic model [35,36]. All analyses were performed using SAS (version 9.1.3; SAS Institute, Cary, NC), unless otherwise specified.

Results
Basic characteristics of study populations

Survival analysis of SNPs and DFS
As shown in Table 3 (Table 4). Taken all together, we selected SMAD3 rs11556090 A>G and RBL2 rs3929G>C as the final independent SNPs for further analyses. The identified SNPs were both predicted to be in the miRNA-binding sites (Table 1), which may affect the binding capacity of miRNA and the corresponding gene. As shown in  (Fig. 1A-D).

Combined effects and ROC curves for DFS prediction
To evaluate the joint effect of SMAD3 rs11556090 and RBL2 rs3929, we combined risk genotypes of rs11556090 CC and rs3929GG+GC into a single variable as the number of unfavorable genotypes (NUGs) ( Table 5). The trend test indicated that increased NUG was associated with a shorter DFS regardless of univariate (trend test: P = 0.0003) or multivariate analysis (trend test: P = 0.0001). The Kaplan-Meier curves for depicting associations between NUG and DFS is also shown in Figure 1E. Using the ROC curves, we further evaluated predictive value of the unfavorable genotypes. The results demonstrated that, as classification of 1-year DFS, the AUC was significantly increased from 68.4% to 74.1% (P = 0.011), when adding NUG to the model including clinical variables (age, sex, ethnic, hepatitis B virus infection, smoking status, drinking status, capsule, portal vein tumor thrombosis, cirrhosis and BCLC staging) as classifiers (Fig. 1F).

In silico functional validation
As mapped on the UCSC website (https://genome.ucsc. edu/), rs11556090 is located in exon 9, 3ʹUTR of SMAD3,  while rs3929 is located in exon 22, 3ʹUTR of RBL2 ( Fig. 2A  and B). Both of them were predicted to be miRNA binding sites of the corresponding genes by the SNPinfo. We further conducted the expression quantitative trait loci (eQTL) analysis by searching on the GTEx Portal (http:// www.gtexportal.org/home/) and using data from the 1000 Genomes Project to examine the effects of SNPs on the mRNA expression levels of corresponding genes. We only found information about RBL2 on GTEx Portal that the rs3929 C allele was significantly associated with increased expression levels of RBL2 in liver tissues (P = 1.8 × 10 −7 , Fig. 2C) and the whole blood (P = 3.9 × 10 −14 , Fig. 2D) by general linear regression analysis. Data of Asian population (the Chinese and Japanese together) from the HapMap 3 Project showed no significant association for neither rs11556090 nor rs3929 with their corresponding gene expression levels (P = 0.486 and 0.686, respectively, Fig. 2E and F).

Discussion
In this hypothesis-driven study, we identified SMAD3 rs11556090 A>G and RBL2 rs3929 G>C in the cell cycle pathway may independently or jointly modulate DFS of HCC patients. We also observed that combining with unfavorable genotypes of these loci significantly improved prediction performance of HCC survival in the model that included the known clinical prognostic variables. Furthermore, the results from the GTEx Portal demonstrated that the C allele was significantly associated with increasing expression levels of RBL2, suggesting that RBL2 rs3929 may modulate DFS of HCC patients possibly through a mechanism of modulating gene expression.
Cell cycle control is an important process for DNA damage repair. When DNA damage occurs, cell cycle checkpoints are activated and cell cycle progression is paused, allowing cells sufficient time to repair the damage and determining whether cells will survive or die, which is controlled by the cell cycle pathway players [21]. Previous studies have shown that the identified genes, SMAD3 and RBL2, are important players in the cell cycle control. For example, SMAD3 plays a critical part in the regulation of transforming growth factor-β (TGF-β), which is a potent inhibitor of cell cycle progression at the G1 phase [37]. Likewise, RBL2, encoding one of the retinoblastoma family proteins (RBL2, also known as p130 and pRb2), is a key switch at the restriction point (R) and inhibits the S phase entry by physical combination with E2F transcription factor and actively represses gene transcription that regulates DNA synthesis in the S phase [38]. Other studies reported that SMAD3 functioned as a tumor suppressor in cancers of the stomach, [39], breast [40] and prostate  [41], while RBL2 functions as a tumor suppressor in cancers of the breasts, ovaries and endometrium [42]. A similar function in HCC has also been illustrated for both of SMAD3 and RBL2 in the previous studies [43,44], suggesting the significant function of SMAD3 and RBL2 as a tumor suppressor in HCC.
In this study, both of the two loci we identified were computationally predicted to be located in the microRNA-binding sites by the SNPinfo. SMAD3 rs11556090 is located in the binding site of hsa-miR-132, hsa-miR-188, hsa-miR-212, hsa-miR-30b, hsa-miR-337, hsa-miR-431, hsa-miR-532 and hsa-miR-9, while RBL2 rs3929 was located only in has-miR-134. Surprisingly, the above-mentioned miRNAs, except for has-miR-337 and has-miR-532, have been associated with development or survival of HCC [45][46][47][48][49][50][51]. Therefore, it is likely that the underlying mechanisms of the two SNPs in modulating HCC survival are to regulate the transcription of proteins by affecting activities of miRNA-binding. SNPs in the potential miRNA-binding sites are suggested to be functional and always good candidates for causal variants of human disease occurrence or progress [52]. Specifically, we found that the RBL2 rs3929 C allele associated with a longer DFS was correlated with increased mRNA expression levels of RBL2 in normal liver tissues and peripheral blood lymphocytes by using data from the GTEx Portal. These results are consistent with the characteristics of a tumor suppressor, although the majority of the donors for the GTEx project are of the white population. However, somatic mutations in RBL2 were observed previously only in primary nasopharyngeal carcinomas, lung tumors, and Burkitt's lymphomas but not in HCC. Taken together, we can infer that RBL2 rs3929 may be a functional and important variant for HCC progression.
The results also demonstrated a significant locus-dosage effect in patients with a larger number of unfavorable genotypes, who had a shorter DFS, suggesting an interaction of SMAD3 rs11556090 and RBL2 rs3929 in the cell cycle pathway on HCC progress. The potential mechanism under the observed association is probably that both of SMAD3 and RBL2 are the key substrates of CDK4, which can control cell cycle progression from the G1 to S phase by phosphorylation of these proteins [37,53]. Importantly, The combination of the two loci significantly improved prediction performance of the model that included the known prognosis factors of HCC patients.
It should be noted that there were limitations of this study. The first limitation included restriction of the recruit population. We conducted the study only in the population from Guangxi Zhuang Autonomous Region, where the main cause of HCC is HBV infection [28]. The results may not apply to other ethnic groups or disease etiologies, such as hepatitis C virus infection, nonalcoholic fatty liver disease and alcoholic liver disease. Secondly, the significance of RBL2 rs3929 and SMAD3 rs11556090 in the prognosis of HCC has been demonstrated in this study, however, it still lack epidemiology validation and biological mechanism confirmation, besides the significant eQTL results for RBL3 rs3929 from the GTEx project. Finally, because of lack of more detailed clinical information, we did not efficiently evaluated the potential effects of different therapies after radical surgery resection on the outcomes of HCC patients, or their potential associations with the identified SNPs. Therefore, larger scale, more comprehensive and multi-institutional epidemiology investigations and biological studies are warranted to further validate these results.
In conclusion, this study identified a prognostic role of SMAD3 rs11556090 and RBL2 rs3929 of the cell cycle pathway in HCC patients. Although both of SMAD3 and RBL2 are important in the control of cell cycle progression from the G1 to S phase, the interpretation of our findings should be cautious, until validated by mort patient cohorts and HCC cell lines.