Genomic Identification of RNA Editing Through Integrating Omics Datasets and the Clinical Relevance in Hepatocellular Carcinoma

RNA editing is a widespread post-transcriptional mechanism to introduce single nucleotide changes to RNA in human cancers. Here, we characterized the global RNA editing profiles of 373 hepatocellular carcinoma (HCC) and 50 adjacent normal liver samples from The Cancer Genome Atlas (TCGA) and revealed that most editing events tend to occur in minor percentage of samples with moderate editing degrees (20–30%). Moreover, these RNA editing prefer to be A-to-I RNA editing in protein coding genes, especially in 3′UTR regions. Considering the association between DNA mutation and RNA editing, our analysis found that RNA editing maybe a complementary event for DNA mutation of HCC risk genes in HCC patients. We next identified 454 HCC-related editing sites, and many locate on the same genes with the same editing patterns. The functional consequences of editing revealed 2,086 functional editing sites and demonstrated that most editing in coding regions are non-synonymous variations. Furthermore, our results showed that editing in the 3′UTR regions tend to influence miRNA–target binding, and the editing degree seems to be negatively correlated with gene expression. Finally, we found that 46 HCC-related editing sites with consequence are able to distinguish the prognosis differences of HCC patients, suggesting their clinical relevance. Together, our results highlight RNA editing as a valuable molecular resource for investigating HCC mechanisms and clinical treatments.


INTRODUCTION
Hepatocellular carcinoma (HCC) is a kind of malignant tumor with high mortality. It ranks third among all cancer-related mortality in the world. About 3% of the patients with cirrhosis can result in HCC, which is the most serious complication in chronic liver diseases (1). The high mortality rate of HCC is mainly due to it being asymptomatic in the early stage and the lack of effective treatments for even mid-term patients. This poses a great threat to the patient's life and also brings heavy economic burdens to the society and families. Therefore, it is of great significance to clarify the pathogenesis of HCC as soon as possible, and thus formulate more effective strategies for clinical diagnosis and treatment. Previous studies on the pathogenic mechanisms of cancer suggested that DNA mutations are a driving factor in cancer development; however, many HCC tumor samples were found to be free of carcinogenic DNA-driven mutations (2), which indicate that other driving events are involved in the occurrence and development of HCC.
RNA editing is an important post-transcriptional modification event, which change genetic information at RNA level and generate results similar to DNA mutations, thus increasing the diversity of transcripts and proteomes (3). The most common type of RNA editing in human cells is the transformation of adenine nucleotides (A) into inosine (I), which was mediated by adenosine deaminase family proteins (ADARs) (4). Because inosine (I) is recognized as guanine (G) nucleotide during translation, it is also called A-to-GRNA editing. In addition to changes in A-to-I (G) RNA editing, human cells also have a small number of other RNA editing types (5). Recent bioinformatics analysis found that RNA editing events are extensive across the human transcriptome (6).
RNA editing in normal cells are associated with adaptive evolution and cell development (7). Conversely, dysfunction in RNA editing systems will have a series of effects on subsequent RNA regulation processes. First, RNA editing in the protein coding region can affect amino acid translation, producing proteins with different structures or functions, and then affecting protein expression activity. For example, the A-to-I hyper-editing in RHOQ transcripts will induce the abnormal elevation of RHOQ protein, and thus promote the invasion and metastasis of cancer cells in colorectal cancer (8). The editing of SLC22A3 transcript can induce the down-expression of SLC22A3 protein, which contributes to the early invasion in familial esophageal squamous cell carcinoma (9). Second, about 95% of multi-exon genes produce different transcripts through alternative splicing (10,11), which is common in human liver cancer (12). RNA editing occurring in splicing sites or splicing regulatory elements possibly affects the variable splicing processes of RNA. For instance, the hyper-editing on one of the potential branching sites of PTPN6 causes the third intron retention, which is associated with the pathogenesis of acute leukemia (13). Third, genomewide sequencing analysis revealed that most RNA editing sites were located in the non-coding regions of the genome, including the 3 ′ UTR, intron and intergenic regions, which may influence the regulation of non-coding RNAs, especially miRNA regulation on the 3 ′ UTR regions (14,15). These studies indicate that RNA editing events will affect a series of downstream RNA regulation processes, which are closely related to the disease processes.
Recently, the incidence and progression of HCC were found associated with RNA editing events and may further help us reveal the pathogenic mechanisms underlying HCC. Chen et al. suggested that the hyper-editing event of AZIN1 results in a serine-to-glycine substitution at residue 367 of the AZIN1 protein, which may be a potential driver in the pathogenesis of HCC (16). Chan et al. identified an average of 20,007 A-to-I RNA editing events in transcripts by utilizing RNA-seq of three paired HCC and their adjacent non-tumor samples, then validated the expression level of ADARs that are related to editing degrees of FLNB and COPA in a large cohort with microarray analysis. They also found that the expression level of ADARs which mediate A-to-I RNA editing is related to the risk of HCC recurrence (17). Similarly, research on two pairs of HCC patients revealed that BLCAP transcript is hyper-edited, which will enhance the phosphorylation of AKT, MTOR, and MDM2 and inhibit the function of TP53, thus promote cell proliferation and tumor development (18). Another study identified HCC-related RNA editing sites by genomic and transcriptomic analysis of nine pairs of HCC and normal samples (19). However, the number of HCC patients used in these studies is relatively small, and their statistical efficacy is limited. Furthermore, the main concern is the limited effect of ADAR enzymes on RNA editing, and the limited analysis of downstream regulation systems, which RNA editing may affect.
Recently, The Cancer Genome Atlas (TCGA) project provides a large number of omics data of malignant tumors. Han  Most importantly, the specific changes in downstream RNA regulation system caused by RNA editing were not thoroughly analyzed, and the genome-wide distribution pattern of RNA editing in HCC is not described in both studies. In addition, as they focused on the common pathogenic mechanism of multiple cancer types, many HCC-related RNA editing sites were not presented. We suggested that it should be clearly described whether the RNA editing was observed in cases where DNA mutations are absent in HCC-relevant driver genes.
In our study, we intend to de novo identification of HCCrelated RNA editing sites by integrating multiple omics data with bioinformatics methods, including genomic mutation, transcriptomic variation, and reference single nucleotide polymorphism (SNP) information, using 373 HCC tumor samples and 50 adjacent normals from TCGA. Here, the genomic distribution pattern of RNA editing events was described. We also deeply analyzed the influence of HCC-related RNA editing on downstream regulation system, including the effect on protein translation and miRNA regulation. Based on clinical information of HCC patients, our study further identified new biomarkers for clinical prognosis, which will promote the disclosure of molecular mechanisms of HCC from the perspective view of RNA editing.

Data Retrieval
Pair-end RNA-seq BAM files originating from 373 HCC cancer samples and 50 adjacent normal liver samples were downloaded from the database of Genotypes and Phenotypes (dbGaP) originally from The Cancer Genome Atlas (TCGA, https:// portal.gdc.cancer.gov/) research project (22). Validated RNAseq FASTQ files of HCC cell lines and normal liver samples were downloaded from ArrayExpress (https://www.ebi.ac.uk/ arrayexpress/, E-MTAB-4052) (23), including three human normal liver samples and two Huh7 RNA-seq FASTQ datasets. DNA mutation, gene expression, and clinical information datasets were also downloaded from TCGA. Single nucleotide polymorphism (SNP) annotations were downloaded from dbSNP version 137 (24) and the 1000 Genomes Project (25).
We obtained the list of known tumor suppressive genes (TSGs) and oncogenic genes (OGs) from a previous study (28), which integrates mRNA expression, copy number variations (CNV), and DNA mutation information and generated a continuous ranked list for each gene, ranging from more negative (TSGs) to more positive (OGs) with consistent changes across tumors. Here we used a strict score threshold to get OGs (score ≥7) and TSGs (score ≤−7) for further analysis. The statistical significances for the enrichment of OGs and TSGs were calculated by hypergeometric tests. All of the data resources mentioned above are shown in Supplementary Table S1.

De novo Detection of RNA Editing Sites
First, the downloaded BAM files of the TCGA samples were converted to FASTQ using BEDtools (29), the FASTQ files were aligned to the human reference genome (GRCh38) by STAR with default parameters (--outFilterMultimapScoreRange 1 -sjdbScore 2 -outFilterScoreMinOverLread 0.33 --outFilterMultimapNmax 20 --sjdbOverhang 100) (30). Second, putative RNA editing sites were identified by Genome Analysis Toolkit (GATK4) with default parameters (HaplotypeCaller --gcs-max-retries 20 --heterozygosity 0.001 --max-reads-peralignment-start 50 --min-base-quality-score 10), using uniquely mapped reads after PCR duplicates were removed (31). Third, computational filters for vcf files were applied through five steps: (i) removing DNA mutation sites for each HCC sample; (ii) taking out all known SNPs in dbSNP version 137 or the 1000 Genome Project, and also insertion or deletion sites; (iii) further filtering sites to obtain editing sites with high confidence: if Fisher Strand (FS) >20, or Quality by Depth (QD) <2, or editing was supported <2 reads, or total coverage reads <10, these sites were removed, and we required at least 10% difference between the editing degrees of 90% quantile and 10% quantile across all samples; (iv) sites with 100% editing degree were also filtered, as 100% editing efficiency is thought to be unrealistic (6); (v) keeping variants detected in at least 1% of the samples because they are unlikely to be rare variants. Editing degree was defined as the percentage of edited reads among the total mapped reads at a given site (20).
Finally, we restricted editing sites to 46 human chromosomes. To get the exact RNA, which was edited, we used BEDtools to map the editing sites with gene annotation gtf files. If the RNA editing sites were simultaneously mapped to two strands, these sites were further removed. At last, we get 19,431 RNA editing sites for further analysis.
The validated RNA-seq FASTQ datasets were aligned to the human reference genome (GRCh38) by bowtie, processed by GATK4 with default parameters, with no matched DNA mutation datasets. Other pipelines were similar to the aforementioned method.

Identification of HCC-Related RNA Editing Sites
HCC-related RNA editing sites include HCC gain, HCC loss, and significant dysregulated editing (dys-edit) sites. HCC gain or HCC loss editing sites were defined by Fisher's exact test (Benjamini-Hochberg correction, adjust p < 0.05), with HCC gain editing sites restricted to be no more than 5% editing sites in normal samples and HCC loss editing sites restricted to be no more than 5% editing sites in cancer samples. Dys-edit sites were just focused on the editing degree of 50 HCC cancer samples and mapped 50 normal samples, which were determined by two steps: (i) using the paired Student's t-test (Benjamini-Hochberg correction, adjust p < 0.2 and p < 0.01); (ii) restricting editing sites to those having more than 0.25 editing degree change in at least two pairs of normal and cancer samples. The HCC dys-edit sites identified in the above two steps were changed in the same direction (simultaneously hyper-edited or hypo-edited in two steps). Finally, we identified 454 HCC-related RNA editing sites.

Functional Enrichment Analysis for HCC-Related RNA Editing Sites
First, HCC-related RNA editing sites were mapped to gene name by BEDtools. Then, we performed functional enrichment analysis by hypergeometric test (Benjamini-Hochberg correction, adjust p < 0.05). We focused on chemical and genetic perturbations (CGP), reactome pathways and biological processes (BP) of gene ontology (GO) originating from MsigDB.

Protein Coding and Alternative Splicing Change
To define whether an editing site can change protein translation or alternative splicing, we re-annotated them by ANNOVAR (32).

miRNA-Target Binding Prediction
RNA editing in the 3 ′ UTR regions may influence the binding and regulation of miRNAs, including loss of existing miRNA-target regulation and gain of new regulation, or change the binding strength of existing miRNA-target regulation. Therefore, for each RNA editing site in the 3 ′ UTR regions, we simultaneously calculated and compared the miRNA binding in both edited and reference 3 ′ UTR regions. We computationally predicted whether an miRNA binds to the mRNA regions around editing sites using miRanda (33) as described in Hwang et al. (5), which calculates the binding energy to estimate the thermodynamic properties of a predicted duplex and a complementarity score to estimate the mismatch. Briefly, for mRNA as a target sequence, two types of sequences were prepared with flanking regions of editing sites (50 bp upstream and downstream) in all mRNA transcripts in UCSC: the reference sequence and the editing sequence. The mature miRNA sequences were also prepared, which were obtained from miRBase (34). The binding energies between miRNA sequence with both a reference and edited mRNA sequences were calculated by miRanda (v3.3a) with default parameters (score > 140, gap-open penalty set to −4 and gapextend penalty set to −9). We next used delta G <-14 kcal/mol as a threshold for free energy of duplex formation to obtain more confident miRNA-target regulations, which were maintained for the following analysis. The comparison was performed for these two types of mRNA sequence, with all the predicted binding pairs of miRNAs and mRNA targets. If miRNA-target relationships just appeared in reference but not in the edited sequence, these were defined as "edited loss" conversely, defined as "edited gain." As for the relationships that both appeared in the reference and edited sequences, if the binding energies changed more than 14 kcal/mol between the reference and edited sequences, they were defined as "edited change." The "edited gain, " "edited loss, " and "edited change" constituted the candidate pool of RNA editing sites that may induce miRNA-target regulations to change. Finally, if the editing of an mRNA target induces more than 10 miRNA-target regulation change, this editing site was defined as having an miRNA-target regulation consequence.

Expression-Related RNA Editing Sites
To define whether RNA editing can influence RNA expression, we calculated the Pearson correlation between editing degree and expression level among samples where both expression and editing degree were measured (editing degree is not 0; Benjamini-Hochberg correction; adjust p < 0.05). The normalized gene expression levels of raw read counts were calculated using DESeq2 (35). First, we defined the editing risk score for each HCC clinical sample. Univariate Cox regression analysis was performed to evaluate the association between survival duration and the editing degree of each editing site. A regression coefficient with a plus sign indicated that increased editing degree is associated with an increased risk of survival (risky editing); conversely, a minus sign indicated that an increased editing degree is associated with a decreased risk of survival (protective editing). More specifically, for editing sites, which are located in a gene region, we assigned a risk score to each HCC patient according to a combination of the product of the editing degree and gene expression, weighted by 1 or −1 according to the regression coefficients from the univariate Cox regression analysis mentioned above. The risk score for each patient was calculated as follows: where, β i is 1 or −1 when the Cox regression coefficient of the editing site i is a positive or negative value, respectively. The n is the number of HCC-related RNA editing sites with consequence that are located in gene regions with an editing degree more than 0 in each sample. Exp gene(i) is the expression level of gene in which the editing site i occurs. Edit i represents the editing degree of the editing site i.
To investigate whether risk scores are associated with tumor grades and tumor stages, we use the Wilcoxon rank-sum test. As the sample number with a tumor grade 4 or tumor stage 4 is too small, only 12 and 13, respectively, we only considered grade 1, grade 2, and grade 3 plus (including grade 3 and grade 4) for tumor grades and stage 1, stage 2, and stage 3 plus (including stage 3 and stage 4) for tumor stages. All patients were then classified into high-risk and low-risk groups using the median risk score as the cutoff point. The Kaplan-Meier method was further used to estimate the differences in overall survival time for these two patient groups (log-rank test). Patients having higher risk scores were expected to have poor survival outcomes.
We further focused on three prognostic-related editing sites whose editing degrees are significantly correlated with survival (p < 0.05). The risk scores were calculated according to the mathematical formula above. For HBV/HCV infected patients or non-alcoholic fatty liver patients, we did a similar analysis mentioned above to obtain a risk score for each sample using three prognostic-related editing sites. To determine whether editing risk scores can provide additional predictive power, we performed a multivariate survival analysis using prognostic factors, including gender, age, body mass index (BMI), grade, and stage of HCC patients, along with editing risk score.

Global Properties of the Inferred RNA Editing Sites in HCC and Normal Samples
Several de novo methods for detecting RNA editing were developed recently (5,6,20). Here we combined these methods and proposed a multi-stage method to gradually identify RNA editing sites by integrating DNA mutation and SNP datasets ( Figure 1A) (see details in the Methods section). Totally, we obtained 19,431 RNA editing sites for further analysis. In general, HCC tumor samples have a higher percentage of RNA editing sites compared to the normal samples ( Figure 1B). More than half of the editing sites occurred in no more than 10 samples ( Figure 1C). Most editing sites presented moderate editing degree, where 20-30% editing degree accounted for the largest proportion ( Figure 1D). We also found that A-to-IRNA editing accounts for most of the RNA variants in the list, and the following enriched variant types were T-to-C, G-to-A, and Cto-T RNA editing, which was consistent with previous research [ Figure 1E; (5,6)]. We next annotated these editing sites with gene types and revealed that most of the editing sites are located on protein-coding genes. In addition, a moderate number of RNA editing sites are located in intergenic and lncRNA regions ( Figure 1F). Consistent with a previous study, more than half of the editing sites are in the 3 ′ UTR regions, accounting for 53.68%, and the following are in the intron regions ( Figure 1F). Notably, Han et al. reported just a few A-to-I RNA editing in the CDS regions; here we identified about 5% editing events in the CDS regions, which include many other types of RNA editing. After we restricted to A-to-I RNA editing, the percent decreased to be 1.66% in the CDS regions (Supplementary Figure S1).

RNA Editing May Be a Complementary Event for DNA Mutation of HCC Risk Genes in HCC Patients
As RNA editing and DNA mutation have a similar effect to increase the diversity of transcripts in cells, thus we wonder whether there is a connection between these two events. For a given gene region, if it is identified to be mutated, it would not be identified as editing at the same time in our recognition method. However, we can dissect whether there is a correlation between RNA editing and DNA mutation in HCC risk gene sets. We obtained 33 HCC relevant cancer genes from a previous report (36), which we defined as HCC risk genes, such as TP53, PTEN, and TERT, of which we observed 24 risk genes with DNA mutation (145 mutation sites involved), and six risk genes with RNA editing (14 editing sites involved). HCC patients with DNA mutation (164) and 35 cases lacking DNA mutations in HCC risk genes were observed (Figures 2A,B). It is found that all of these 35 patients that lacked DNA mutation had RNA editing in the HCC risk genes. For example, patient TCGA-CC-A3M9-01 had no mutated risk genes, but had RNA editing sites in PTEN (chr10:87967417) and UBE2H (chr7:129830735).
In patient TCGA-CC-A3MB-01, three sites of RNA editing occurred in the risk gene, TERT (chr5:1262789, chr5:1262852, chr5:1263350), and the same with patient TCGA-DD-AADV-01 ( Figure 2B). In addition, we found that the percentage of the edited sites in the HCC risk genes is much higher in patients who lacked DNA mutation compared to patients with DNA mutation in the HCC risk gene regions (p < 2.2e-16, Wilcoxon test) ( Figure 2C). Meanwhile, the mutation percentage is much lower in edited patients compared with non-edited patients in HCC risk gene regions (p = 4.63e-16, Wilcoxon test) (Figure 2D). The observations above implied that RNA editing was a risk factor to HCC as a complementary event for DNA mutation in HCC risk genes.

HCC-Related RNA Editing Prefer to Locate on Liver-Specific Genes
Chen et al. proved that RNA editing of AZIN1 promotes progression of HCC (16). In our study, by genome-wide identification of RNA editing sites across large RNA-seq samples, we aimed to reveal more HCC-related RNA editing sites. Generally, we identified 454 HCC-related RNA editing sites, including 264 HCC gain, 166 HCC loss, and 24 HCC dysedited sites (Figures 3A,B, Table 1). Notably, a serine-to-glycine substitution at residue 367 of AZIN1 was also identified by our method, which is reported to be hyper-edited in HCC patients compared to normal samples in previous studies (16). Two editing sites of COX18 were hypo-edited in HCC compared with paired normal samples. The most significantly dys-edited RNA editing sites were editing on the ACOX1 (chr17: 75941513, p = 1.15e-05). Notably, ACOX1 was reported to play important roles in cancer development of HCC by stimulating hepatic fatty acid oxidation and H 2 O 2 accumulation (37).
To further investigate the biological function of these HCCrelated RNA editing sites, we performed functional enrichment analysis for these genes with the HCC-related RNA editing sites (hypergeometric test, adjust p < 0.05; Supplementary Figure S2, Supplementary Table S2). Our results indicated that HCCrelated editing genes prefer to be liver-specific genes, which were involved in nuclear transport, catabolic and cell cycle processes. The enrichment of liver-specific genes and liver cancer-associated genes suggested that RNA editing can be a potential research area to analyze the mechanism, clinical prevention, and treatment of HCC patients.

HCC-Related Editing Sites Prefer to Locate on the Same Genes With the Same Editing Patterns
A large percent of HCC-related RNA editing sites were located on gene regions (88.99%, 404/454) especially for HCC gain and dys-edited sites. Interestingly, 73 genes were shared by at least two HCC-related RNA editing sites and accounted for 50% (202 editing sites of 404) (Figure 4A, Supplementary Table S3). Notably, we found that different editing sites located on the same genes tend to be with the same HCC editing type, with few exceptions (Figure 4B, Table 2). For example, there are seven HCC-related RNA editing sites located on MDM4; meanwhile, these seven sites are all HCC gain editing patterns. About 89.04% genes with different editing sites have the same editing patterns, which indicated that these editing sites may have important roles in HCC initiation and development.

The Functional Consequence of RNA Editing Sites
Previous studies have suggested that RNA editing can be functional and involved in carcinoma initiation and progression by influencing amino acid encoding, alternative splicing, miRNA-target regulation, or expression changing of the corresponding gene [(38, 39); Figure 5A]. Totally, we identified 2,064 editing sites with functional consequence, of which 46 were HCC-related ( Figure 5B, Supplementary Table S4). For editing in coding sequence (CDS) regions, we identified 554 editing sites with functional change, including one stop loss, seven stop gain, 10 unknown, and 536 non-synonymous (Figures 5B,C), of which 11 editing sites were HCC related. Specifically, the serine-to-glycine substitution at residue 367 of AZIN1 was reported in a previous study; the editing may induce a conformational change and a cytoplasmic-to-nuclear translocation, which will result in tumor initiation and aggressive tumor progression (16). Another example is serine-to-threonine substitution at residue 1,768 of MUC6, which located in a proximate repeat region annotated in the Uniprot database (40). Importantly, abnormal expression of MUC6 was reported to be associated with many gastrointestinal cancers, such as HCC and cholangiocarcinoma (41). Hence, we supposed that several RNA editing located in important protein domains, which change the properties of the protein, might play roles in the progression of HCC. In terms of alternative splicing, 30 editing sites identified by ANNOVA can induce splicing change, of which no one was HCC related. As more than half of the RNA editing sites located on the 3 ′ UTR regions, next, we considered miRNA-target regulation change caused by RNA editing in the 3 ′ UTR regions. We use miRanda to calculate the binding energies between miRNA sequence and editing sequence (or reference sequence). We identified 1,356 editing sites that affect miRNA-target regulation, of which 26 were HCC related. Among these 26 HCC-related RNA editing sites, the top three affected miRNAs were miR-17-3p, miR-20b-3p, and miR-593-3p (with 11, 7, and 7 target regulation changes induced by HCC-related editing sites, respectively (Supplementary Table S5). Notably, miR-17-3p, miR-20b-3p, and miR-593-3p are widely involved in caner initiation and progression, including HCC (42)(43)(44)(45)(46). We observed that the expression levels of 163 RNA editing sites were correlated with the editing degree of corresponding genes, of which 10 were HCC-related editing sites. Moreover, Pearson correlation coefficients (PCCs) tend to be negative values (Figure 5D), which indicated that most of the RNA editing sites might cause downexpression of the corresponding genes.

HCC-Related RNA Editing Sites With Functional Consequence Can Predict Clinical Prognosis
To further investigate the relationship between RNA editing and HCC clinical characteristics, we first defined the risk score for each clinical sample by integrating the expression levels and editing degrees of HCC-related RNA editing sites with functional consequence, weighted by 1 or −1 according to the regression coefficients from univariate Cox regression analysis (detailed in the Materials and Methods section). As expected, we found that the risk scores measured by 46 HCC-related RNA   (Figures 6A,B). We further classified all HCC patients into high-risk and lowrisk groups using the median risk score as the cutoff point. Our analysis suggested that patients having higher risk scores were expected to have poor overall survival times (log-rank test, p = 0.0003; Figure 6C). The risk score was still significantly associated with patient overall survival (hazard ratio = 1.03, p = 0.01) in the Cox multivariate analysis, after adjusting for patients' gender, age, BMI, tumor grades, and stages ( Table 3), which indicates that these HCC risk RNA editing sites could be potential biomarkers to predict clinical outcomes of HCC patients. We next identified three prognostic-related editing sites whose editing degrees are significantly correlated with survival time using univariate Cox regression analysis (p < 0.05; Table 4). The median risk score using these three prognostic editing sites and matched genes can also significantly classify patients into separate groups with different clinical outcomes (log-rank test, p = 0.03, data not shown). After adjusting for patients' gender, age, BMI, tumor grades, and stages, the risk score still has predictive power (hazard ratio = 1.19, p = 0.001; Table 5). As HBV/HCV infection, alcoholic consumption, and non-alcoholic fatty liver diseases are the three main risk factors to HCC patients, we next examined the predictive power of these three editing sites with these risk factors. We found that these three prognosticrelated editing sites can be used to predict the clinical outcomes for HBV-/HCV-infected and non-alcoholic fatty liver disease patients (log-rank p-values were 2e-4 and 9e-4, respectively) ( Figures 6D,E), but not alcoholic consumption patients (data not shown). Furthermore, we dissected the function of these 46 HCCrelated RNA editing sites with functional consequence. It is shown that they were widely involved in three reactome pathways: "metabolism of proteins, " "generic transcription pathway, " and "cell cycle" (Figure 6F, Supplementary Table S6). The functional enrichment analysis for chemical and genetic perturbation datasets indicated that they were enriched in tumor differentiation, nRAS signaling pathway, and liver cancerassociated genes.

DISCUSSION
HCC is a complex disease with poor prognosis and affected by multiple genetic alterations. DNA mutations are the most widely investigated driver events in all cancer types, including HCC. However, the mutation events are not presented in all HCC patients. Furthermore, we found that 98.51% mutation sites occurred in only a single HCC patient. Similar to the consequence of DNA mutation, RNA editing can also produce nucleotide variations at the RNA level. Importantly, the RNA editing seems to be a more sophisticated regulation, as the editing degree can be 0-100%, while DNA mutation should be mutation or not (0 or 1). Here we systematically identified RNA editing sites by integrating DNA mutation and SNP datasets, and found that HCC samples have a significantly high percentage of RNA editing sites compared with normal samples. We also observed RNA editing in 35 cases that lacked DNA mutations in HCC risk genes, and more edited risk genes occurred in mutated patients compared with patients who do not have any DNA mutation in the HCC risk gene. Thus, we proposed that RNA editing may be a risk factor to HCC, as a complementary event for DNA mutation in HCC risk genes.
To further characterize the inferred RNA editing sites in our study, we compared the overlap of A-to-IRNA editing sites in gene regions identified by our study and the editing sites collected in the database of RADAR (Supplementary Figure S3). We found that most A-to-IRNA editing sites were in the RADAR database and accounts for 87.11% (9,565/10,981). Furthermore, we re-identified RNA editing sites in three human normal liver tissues and two HCC cell line samples, and found that 38.11% HCC-related RNA editing sites were edited in at least one sample (173/454), and 39.13% HCC-related editing sites with functional consequence were identified (18/46). These results indicated that RNA editing sites identified by our research were authentic, which can be used for further analysis. Next, HCC-related RNA editing sites were identified by comparing editing in HCC cancer samples and normal samples. Functional annotation showed that HCC-related editing sites prefer to be liver-specific genes, which suggested the important roles of RNA editing in the development of HCC.
Previous studies focused on the effect of ADAR enzymes on RNA editing. Here we addressed another important question on whether these editing sites have an effect on the downstream regulation system. By considering the influence of RNA editing on coding sequence change, alternative splicing, miRNA-target regulation, and expression change, we identified 2,064 editing sites with functional consequence, which accounts for 10.62% of the total RNA editing sites (2,064/19,431). This percentage indicates that RNA editing events may play other important roles, such as the alteration of RNA-binding abilities, which were induced by RNA structure change, which was, in turn, induced by editing. Notably, we found a number of editing sites with miRNA-target regulation changes. More were identified when we set lower thresholds (Supplementary Table S7). If we defined at least one miRNA relationship change by RNA editing, 90.81% editing sites in the 3 ′ UTR regions can induce miRNAtarget regulation change (7,651/8,425). Therefore, we should pay more attention to the effect of miRNA-target regulation change induced by RNA editing. Importantly, in regard to the tumor suppressive genes (TSGs) and the oncogenic genes (OGs) reported (28), we demonstrated that these functional HCCrelated editing genes were significantly enriched in OGs, but not TSGs (data not shown), and it remains true when we set lower thresholds to identify editing sites with miRNA-target regulation changes (Supplementary Table S7).
Importantly, we found 46 HCC-related RNA editing sites with functional consequence that can be used to predict the clinical outcome in HCC patients. In addition, they have independent predictive power after considering the gender, age, BMI, tumor grades, and stages. We identified three clinical prognostic-related editing sites, which can also provide predictive values in HBV-/HCV-infected patients and non-alcoholic fatty liver disease patients, including editing sites in ALDH2 (chr12:111811793, A-to-I RNA editing), CNTNAP3B (chr9:41929326, C-to-T RNA editing), and CBWD5 (chr9:65675990, T-to-G RNA editing), of which ALDH2 is an HCC-related editing gene with high frequency (six HCC loss editing sites locates on the ALDH2 gene region). Interestingly, multiple studies revealed that ALDH2 is highly correlated with the pathogenic mechanism, risk, and survival of liver cancer patients, including HCC (47)(48)(49)(50). Functional annotation suggested that these HCC-related editing sites with functional consequence are widely involved in liver cancer-associated genes, especially tumor suppressive genes, and cancer-associated pathways. Therefore, we assume that some RNA editing events may be "driver events" that promote cancer initiation and progression, as well as play a critical role in clinical survival in cancer patients.

AUTHOR CONTRIBUTIONS
ZB and JL conceived the study. ZB and JC processed the sequencing data. LW and FW analyzed the data. JC wrote the manuscript.