A plasma membrane protein based prognostic model in clear cell renal cell carcinoma


 Background

Renal cell cacinoma (RCC) accounts for 3% of human cancers, and clear cell renal cell carcinoma (ccRCC) is the most common pathological type of RCC. Cell surface proteins have been shown to play an important role in the occurrence and progression of various cancers. In this study, we focused on plasma membrane proteins (PMPs), to explore their potential value in ccRCC.
Methods

The PMPs expression profiles and ccRCC patients’ clinical information were downloaded from TCGA database. Through a series of bioinformatic methods, we established a plasma membrane proteins prognostic model.
Results

Multivariate cox regression analysis and area under receiver operating characteristic curve indicated that this model was an effective independent predictor of ccRCC clinical outcomes. Combined with other two clinical characteristics, a nomogram was constructed to predict patient survival at 1, 3, and 5 years.
Conclusions

Our study is the first to explore the prognostic value of plasma membrane proteins in clear cell renal cell carcinoma. We hope our work could provide a new viewpoint for ccRCC prognosis and drawn people’s attention to plasma membrane proteins in clear cell renal cell carcinoma.


Background
Renal cell carcinoma represents around 3% of all cancers, with the highest incidence occurring in Western countries [1]. Generally, during the last two decades until recently, there has been an annual increase of about 2% in incidence both worldwide and in Europe leading to approximately 99,200 new RCC cases and 39,100 kidney cancer-related deaths within the European Union in 2018 [2]. With the development of human society, the problems brought by tumor are more and more prominent. Therefore, an in-depth study of valuable prognostic tools for clinical decisions is vital currently. Proteomic analysis of tumor tissue samples and recognition of potential protein biomarkers in serum or plasma is an evolving eld. Direct analysis of proteins has several advantages over indirect analysis, such as transcriptome analysis, although it requires more tissue and takes more time.
In recent years, cell surface proteins have come into focus because they are readily available and have the potential to become new drug targets. Plasma membrane proteins (PMP) account for about 50% of the cell membrane weight and their functions are complex and varied [3]. Researches show that plasma membrane proteins mediate or initiate phenotypic changes associated with malignant transformation, such as cell proliferation, adhesion, and migration [4][5]. HER-2, a highly expressed receptor protein exist in many types of cancer, can promote the proliferation and invasion of tumor cells when activated [6].
Some PMPs are differentially expressed between tumor and normal tissue, which may be potential therapeutic targets or biomarkers. After analyzed VHL-associated changes in plasma membrane proteins, Aggelis V et al identi ed 19 differentially expressed proteins, which found to be potential biomarkers for ccRCC [7]. These researches show that the PMP disorder may be closely related to the occurrence and development of cancer. However, as far as we know, large-scale gene expression signature has rarely been used to investigate the association between PMP and ccRCC. A more comprehensive understanding of the effects of PMP on tumors could help in the clinical diagnosis of renal cancer and even provide a new, precise direction for treatment.
In this study, we try to clarify the possible role of PMP played in ccRCC and explore their potential value in prognosis as well as targeted therapy. The PMP expression pro les and patients' clinical information of ccRCC were downloaded from TCGA database. Then, we identi ed differentially expressed PMPs through computational methods. A number of bioinformatic analysis were used to study underlying regulatory mechanisms. What's more, a valuable prognostic model was built to predict ccRCC patient's overall survival, as well as provide a new viewpoint for precise therapy of ccRCC. To verify the diagnosis effectiveness of the model, we obtained ccRCC data from GEO and ArrayExpress database. The external validation results show that this PMP based model has certain prognostic value in ccRCC. Our work reveals the prognostic value of PMPs in ccRCC for the rst time. There is no doubt that more experiments are needed to further explore the underlying mechanism.
2 Materials And Methods

Data acquisition
Level 3 RNA sequencing and clinical data from 539 KIRC and 72 paracancerous samples were downloaded from The Cancer Genome Atlas (TCGA) database (https://tcga-data.nci.nih.gov/tcga/). Based on the requirement to the data integrality, patients that met the following criteria were excluded from subsequent analysis: (1) patients with survival time less than 30 days, (2) insu cient information of stage, grade, age and gender. Finally, 482 tumor samples which were from different individuals and 68 paracancerous samples were selected from the training set in this study.
Meanwhile, three microarray datasets were downloaded from GEO and ArrayExpress database (GSE29609, GSE22541, E-MTAB-3267) which includes 116 KIRC patients with corresponding clinical information for external validation as testing set. The "sva" R package was used to eliminate the batch effect.
The plasma membrane protein list was obtained from The Human Protein Atlas database (http://www.proteinatlas.org/). Since the data were downloaded from public database, the ethical approval is not required.

Differential gene analysis
To identify the differentially expressed genes (DEGs) and differentially expressed PMPs (DEPMPs), "limma" R package was used to normalized expression matrix, then make comparisons between tumor and para cancerous tissues. DEGs and DEPMPs were identi ed using the threshold of a log |fold change| > 1 and an p value < 0.05. We extracted DEPMPs from all DEGs and used GO and KEGG pathway enrichment analysis to investigate DEPMPs' molecular function.

Survival related DEPMPs' molecular characteristics
We screened survival related DEPMPs through univariate cox proportional hazards regression analysis. In order to explore the clinical values of those survival related DEPMPs comprehensively, some public databases were used. The protein-protein interaction (PPI) networks was constructed by submitting gene list to the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database (https://stringdb.org/). Since transcription factors play an important role in the initiation of gene expression, we also built a transcription factor regulation network through Cistrome Cancer database (http://cistrome.org/CistromeCancer/), which contains over 23,000 ChIP-seq and chromatin accessibility pro les from human and mouse genomes. It provides 318 TFs' binding information [8]. The interaction between those DEPMPs and corresponding TFs was constructed by cytoscape software.

Plasma membrane protein prognostic model (PMPPM)
Then, we used these survival related PMPs to construct a prognostic model according to multivariate cox proportional hazards regression analysis. After multiplied the expression level of PMPs involved in the model by their Cox regression coe cients, we obtained each patients' Risk Score. The median of Risk Score was regarded as a cutoff to divided patients into high risk and low risk groups, survival analysis was assessed by Kaplan-Meier (K-M) methods. In addition, the predictive value of this model was evaluated by areas under the curve (AUC) of the receiver-operator characteristic (ROC) curve using package "survivalROC" in R and multivariate cox proportional hazards regression analysis.
We further explored the relationship between the PMPPM and clinical characteristics included age, gender, grade, TNM stage, T stage and metastasis. The characteristics of each PMP involved in this model were found in UALCAN database (http://ualcan.path.uab.edu/index.html). Besides, we obtained copy number variations information from Cbioportal database (http://www.cbioportal.org/) [9]. Finally, a nomogram was established to investigate patients 1-year, 2-year, and 3-year overall survival.

Statistical analysis
Heatmap of DEGs and DEPMPs were plotted using "pheatmap" R package with zero-mean normalization.
Two groups of boxplots were analyzed using Wilcoxon-test. "clusterPro ler" R package was used to build GO and KEGG pathway enrichment analysis. We calculated area under the ROC curve through the "survival ROC" R package. For Kaplan-Meier curves, p-values and hazard ratio (HR) with 95% con dence interval (CI) were generated by log-rank tests and univariate Cox proportional hazards regression. All analytical methods above and R packages were performed using R software version 3.6.1 (The R Foundation for Statistical Computing, 2019). All statistical tests were two-sided. P-value < 0.05 was considered as statistically signi cant.

Identify for DEPMPs
The patients' clinical information is shown in Table 1.A total of 7369 DEGs were screened, which included 5467 up regulated and 1902 down regulated genes (Fig. 1A). 98 up regulated and 61 down regulated DEPMPs were found in these DEGs (Fig. 1B). GO enrichment analysis indicate that these DEPMPs were mainly enriched in actin lament organization. For CC, they were enriched in membrane region. Besides, they were mostly enriched in cell adhesion molecule binding in MF categories ( Fig. 2A). KEGG pathway enrichment analysis show that DEPMPs were mainly enriched in actin lament organization as well as regulation of actin lament organization (Fig. 2B). 3.2 Characteristics of survival related PMPs univariate cox regression analysis. PPI network shows that those proteins interact with each other in ccRCC (Fig. 3). In addition, 318 transcription factors (TFs) expression pro les were examined and 60 of them were differentially expressed in ccRCC and normal tissues (Fig. 4A). Then we established a network with 101 survival related DEPMPs and these 60 TFs. The cut-off values are correlation scores > 0.4 and Pvalue < 0.01. The regulation diagram clearly illustrates the regulation relationship between TFs and these PMPs. (Fig. 4B).

Construct and analyze PMP prognostic model
We constructed a prognostic model according to multivariate Cox regression analysis' results ( Table 2). ccRCC patients were divided into two groups with different clinical outcomes (Fig. 5) (Fig. 6A). The AUC value was 0.758, indicate that the model has certain potential in survival monitoring (Fig. 6B). After adjusting for age, gender, tumor grade, tumor stage, tumor size and distant metastasis status and other parameters, the multivariate cox regression analysis shows the PMPPM was an independent predict factor (Fig. 6C, 6D). The Risk Score was signi cantly higher in advanced grade patients, advanced stage patients, distant metastasis patients (Fig. 7). As to the nine gene themselves, we have shown here a box plot of the differences in the expression of these genes in tumor versus normal tissue (Fig. 8A). Besides, we identi ed that, in mutation, Ampli cation was the most common type and CYFIP2 is the most frequently mutated gene. (Fig. 8B). In addition, we get their protein and pan-cancer mRNA expression levels and survival analysis from UALCAN database as a supplement (Fig S1-3).

Validation of PMP prognostic model
After using R package, sva, to eliminate the batch effect, we used the RNA-sequencing data from GEO and ArrayExpress database to validate the PMP prognostic model. The risk score of every patient in testing set was calculated out as above, and the patients were divided into high-risk and low-risk groups based on the median risk score of training set. It turned out that the high-risk group also had visibly worse prognosis than the low-risk one (Fig. 9A). Besides, the AUC of the ROC for risk score was 0.741, meaning it performed well in assessing and predicting the prognosis of patients with ccRCC (Fig. 9B). Taken together, the PMP prognostic model we constructed had certain e ciency and credibly clinical application value.

Predictive nomogram
All independent prognostic factors identi ed by multivariate Cox regression analysis included the PMPPM were used to establish a nomogram (Fig. 10).

Discussion
Protein is the material basis of life, the basic organic matter that constitutes cells, and the main undertaker of life activities. Membrane proteins play an important role in many life activities of organisms, such as cell proliferation and differentiation, energy conversion, signal transduction and material transport. It is estimated that about 60% of drug targets are membrane proteins. Abnormal membrane protein expression causes a variety of diseases including cancer. In recent years, research on the structure and function of membrane proteins has become a hot topic [10]. However, the exact mechanism or the role of plasma membrane proteins in renal cell carcinoma still unclear. In this study, we downloaded a large number of ccRCC data from TCGA, which helped to obtain a comprehensive analysis of plasma membrane proteins in ccRCC patients. After compared gene expression between ccRCC and normal patients' tissues, we identi ed 159 DEPMPs. GO and KEGG pathway enrichment analysis shows that these PMPs were mainly enriched in actin lament organization. Studies have found that actin lament organization plays a role in a variety of tumors such as prostate cancer, head and neck cancers and melanomas [11]. However, there is no reports about actin lament organization in renal cancer, further experimental exploration is needed in the future.
101 survival related PMPs were screened out by univariate Cox regression analysis. With the help of online websites, we could learn more about the molecular characteristics and internal or external relationships of those survival related PMPs. First, the protein-protein network shows that those PMPs interact closely with each other. Then, the TF-PMP network shows that transcription factors FOXM1, NCAPG, CENPA, MYBL2, EOMES, IRF4, IKZF1 and BATF are closely related to these PMPs. Some previously studies have shown those TFs were connected with occurrence and progression of ccRCC [12][13][14][15][16][17][18]. Base on the above ndings, we have good reason to believe that those PMPs play a signi cant role in ccRCC as a whole.
In order to explore whether these survival related PMPs have prognostic value in ccRCC, we constructed a prognostic model according to multivariate cox regression analysis. Survival and ROC analysis indicated that the prognostic model shows considerable value of prognostic prediction. The positive results were also con rmed by external data. What's more, we did a comprehensive analysis of the relationship between the model and clinical parameters. The risk score was higher for advanced grade, stage, and distant metastasis patients. In addition, we also found these nine PMPs involved in this model were closely related to tumor grade, stage and distant metastasis respectively.
With the help of an online database, we explored these PMPs further. We found that mutations are common in these genes and CYFIP2 was the most frequently mutated gene. CYFIP2, cytoplasmic FMR1interacting protein 2, was reported to be a candidate p53 target gene. CYFIP2-induced apoptosis is part of a coordinated p53-dependent response in cancer cells [20]. Nevertheless, studies on CYFIP2 in kidney cancer are rare. The survival analysis of CYFIP2 in different subgroups include tumor grade, race, and gender show that CYFIP2 was closely related to the overall survival of ccRCC patients. Due to the high frequency of CYFIP2 mutation, we think it is necessary to pay more attention to its certain mechanism in ccRCC. Current research shows that the remaining eight PMPs are all closely related to cancer too. . In this study, we found that they were associated with overall survival in patients with ccRCC. However, up to now, existing research cannot fully explain the certain mechanism of those genes. Thus, more large sample prospective studies and basic experiment further de ned the relation between kidney cancer and plasma membrane protein is needed.
It should to be noted that some limitations exist in this study. First, at present, the molecular mechanisms behind key PMPs are still covered, our study need to be validated by more experiment. Second, other researches may draw different results due to different experimental variations and statistical methods.
Beyond these limitations, this study focused on potential molecular mechanisms and clinical signi cance of PMP. we hope this prognostic model could inspire medical scientists on ccRCC prognosis and precise therapy.

Conclusion
Clear cell renal cell carcinoma (ccRCC) is the most common pathological type of renal cell carcinoma.
Many researches indicated that plasma membrane proteins take a nonnegligible part in the occurrence and progression of tumor. This work focused on potential molecular mechanisms and clinical signi cance of plasma membrane proteins. For the rst time, we systematically and comprehensively analyzed the plasma membrane protein expression pro le data of clear cell renal cell carcinoma and constructed a valuable prognosis model. We hope that this model could serve as a bene cial complement of ccRCC diagnosis and treatment.

Declarations
Ethics approval and consent to participate   Protein-protein interaction network of survival related DEPMPs.

Figure 3
Protein-protein interaction network of survival related DEPMPs.     The nomogram plot was built based on three independent prognostic factors in ccRCC.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.