Identification of proteomic markers for prediction of the response to 5-Fluorouracil based neoadjuvant chemoradiotherapy in locally advanced rectal cancer patients

Neoadjuvant chemoradiotherapy (nCRT) prior to surgery is the standard treatment for patients with locally advanced rectal cancer (LARC), while parts of them show poor therapeutic response accompanied by therapy adverse effects. Predictive biomarkers for nCRT response could facilitate the guidance on treatment decisions but are still insufficient until now, which limits the clinical applications of nCRT in LARC patients. In our study, 37 formalin-fixed paraffin-embedded tumor biopsies were obtained from patients with LARC before receiving 5-fluorouracil based nCRT. Proteomics analyses were conducted to identify the differentially expressed proteins (DEPs) between total responders (TR) and poor responders (PR). The DEPs were validated via ROC plotter web tool and their predictive performance was evaluated by receiver operating characteristic analysis. Functional enrichment analyses were performed to further explore the potential mechanisms underlying nCRT response. Among 3,998 total proteins, 91 DEPs between TR and PR were screened out. HSPA4, NIPSNAP1, and SPTB all with areas under the curve (AUC) ~ 0.8 in the internal discovery cohort were independently validated by the external mRNA datasets (AUC ~ 0.7), and their protein levels were linearly correlated with the graded responses to nCRT in the internal cohort. The combination of HSPA4 and SPTB could distinctly discriminate the TR and PR groups (AUC = 0.980, p < 0.0001). Moreover, multiple combinations of the three proteins realized increased specificity and/or sensitivity, while achieving favorable predictive value when moderate responders were introduced into the ROC analysis. Pathways including DNA damage repair, cell cycle, and epithelial mesenchymal transition were involved in nCRT response according to the enrichment analysis results. HSPA4, SPTB and NIPSNAP1 in tumor biopsies and/or their optional combinations might be potential predictive markers for nCRT response in patients with LARC. The DEPs and their related functions have implications for the potential mechanisms of treatment response to nCRT in patients with LARC.

new cases and ~ 0.34 million deaths in 2020 [1]. Due to the late detection or delayed diagnosis, patients are often diagnosed with locally advanced rectal cancer (LARC) in which the tumor has grown into the outermost layers (AJCC T3) or through the rectum wall into local adjacent structures (AJCC T4), with or without positive regional lymph node metastases.
LARC is regularly treated with trimodality therapy comprising preoperative neoadjuvant chemoradiation therapy (nCRT), surgery (total mesorectal excision; TME), and adjuvant chemotherapy. The standardized regimen of nCRT includes neoadjuvant radiotherapy and concomitant chemotherapy with the intravenous 5-fluorouracil (5-FU) or its oral analogue such as capecitabine. Benefits of this approach include significant tumor downsizing, downstaging, and the potential for achieving pathologic complete response (pCR), therefore improving resectability, anal sphincter preservation, local control and survival after radical surgery in patients with LARC [2][3][4][5]. Despite the effective trimodality therapy, the response to nCRT varies among patients with LARC, which is associated with long-term outcomes including recurrence-free survival, distant metastasis, and local recurrence rates [4]. Complete response is found in only about 15 ~ 18% of the LARC patients [4][5][6][7], while a subset of patients (~ 20%), being treated with the same regimen, do not achieve a favorable response -suggesting potential resistance to nCRT. LARC patients who are evaluated to be resistant to nCRT would be supposed to receive optimized treatment earlier in their clinical management. Therefore, there is a strong need for predictive biomarkers to identify the subsets of patients who are resistant or sensitive to nCRT before therapy, which would certainly benefit personalized treatment strategies for patients with LARC.
Clinical factors including tumor size, T and N stage, pathological features and imaging modalities have been shown to facilitate the prediction of the response to nCRT; however, their clinical application is limited due to the moderate sensitivity and specificity in prediction [8][9][10][11][12][13]. Based on the literature, tissue-derived molecular biomarkers involved in DNA mutation and methylation, gene expression profiles, protein and metabolites, the tumor immune microenvironment, and microRNAs have been considered for their potential to predict nCRT response early with promising efficacy [8]. Nevertheless, few viable biomarkers have reached clinical application.
At present, the standard assessment for rectal cancer response to neoadjuvant therapy is the tumor regression grade (TRG). Among existing TRG systems, the American Joint Committee on Cancer (AJCC) TRG system has shown superior prediction for survival outcomes [14]. In this study, we grouped patients with LARC into different response groups according to the AJCC TRG system. Then, proteins were extracted from formalinfixed paraffin-embedded (FFPE) biopsies of patients with LARC before nCRT, before analyzing with liquid chromatography with tandem mass spectrometry (LC-MS/ MS) to characterize the tumor proteomic signature and identify the differentially expressed proteins (DEPs). The signaling pathways these DEPs involved in were explored to gain an insight into their potential roles in treatment response. Furthermore, external mRNA datasets were used to verify the levels of DEPs in different response groups, and the predictive abilities of verified DEPs were assessed using receiver operating characteristic (ROC) curves, laying the groundwork for future clinical application.

Patients and samples
Our study finally enrolled 42 patients diagnosed with LARC (clinical stage T3-4N0 or T1-4N1-2) and treated with pre-operative 5-FU-based nCRT followed with surgery from 2010 to 2019 at the Chinese PLA General Hospital. Their biopsy tissues were collected at the time of pre-treatment staging.
Patients had to fulfill the following eligibility criteria: (a) completion of diagnosis and treatment process in a single center with available pre-treatment FFPE biopsies; (b) histologically confirmed adenocarcinoma; and (c) completion of 5-FU-based nCRT followed by TME surgery as the first therapeutic approach. Initial clinical staging was based on rectoscopy, thorax-abdomen computed tomography (CT) scan and/or pelvic magnetic resonance imaging (MRI). All patients were assigned to pelvic longcourse radiotherapy (50 Gy in 25 fractions over 5 weeks of three-dimensional conformal radiotherapy, 2 Gy per fraction, per day) with concurrent 5-FU-based chemotherapy. Standardized surgery (including abdominoperineal resection; anterior resection; low anterior resection; Hartmann's operation) was performed after an interval of 6 to 14 weeks post nCRT. The study was approved by the Medical Ethics Committee of the Chinese PLA General Hospital (No. S2021-129-01).

Collection of clinical data
Clinical data collected from the medical records of patients are detailed in Tables 1 and 2, including age at diagnosis, gender, histological features (grade of differentiation and mucinous histology), and the tumor location (the distance from the anal verge (AV) to the lowest margin of the tumor on either the MRI or rectoscopy), blood routine indices and the levels of tumor biomarkers (detected before nCRT), etc. Pathologic results were reported according to the 8th AJCC TNM staging classification system. Treatment responses to nCRT and tumor regression grade (TRG) were evaluated by experienced pathologists in accordance with the AJCC TRG system [14]. According to the TRG, patients were divided into three groups: total responders (TR: AJCC TRG0), moderate responders (MR: AJCC TRG1), and poor responders (PR: AJCC TRG2+3). To screen out proteins most related to treatment reaction, the TR and PR groups were defined as the internal discovery cohort used for differential expression analysis, while all of the TR, MR, and PR groups were defined as the total internal cohort.

Protein extraction
Proteins from FFPE samples were extracted using the FFPE Total Protein Extraction Kit (Sangon Biotech, Shanghai, China) following the manufacturer's instruction. Then samples were precipitated with 4 × (v/v) ice-cold acetone and incubated at -20 ℃ overnight. The precipitated proteins were centrifuged at 10,000×g at 4℃ for 15 min and air-dried.

Proteomic analysis
Samples were digested following the filter-aided sample preparation (FASP) method with Amicon Ultra-0.5 centrifugal filters (Merck Millipore, Darmstadt, Germany) with 10 kDa MW cut-off. In detail, protein samples were dissolved in 8 M urea. 100 μg protein was reduced with 10 mM dithiothreitol (DTT) at 37 ℃ for 120 min and alkylated with 20 mM iodoacetamide (IAA) in the same solvent (23 ℃ for 30 min, in darkness), followed by a three-washes with 50 mM ammonium bicarbonate and digestion with 2 μg trypsin (V5111; Promega, Madison, WI, USA) overnight at 37 ℃. Post vacuum drying, peptides were reconstituted in 0.1% formic acid for MS analysis.
LC-MS/MS analysis of tryptic peptides was performed on a quadrupole Orbitrap mass spectrometer (Q Exactive, Thermo Fisher Scientific, Waltham, MA, USA) coupled to Dionex Ultimate-3000 HPLC system (Thermo Fisher Scientific) via a nano-electrospray ion source. The peptides (~ 350 ng) of peptides were separated using an in-house packed C18 analytical column (Magic C18 particles, 3 μm, Michrom Bioresource, Auburn, CA, USA) and measured over a total gradient length of 120 min

Analysis of MS data
Q-Exactive raw data were searched against the uniprot_ sprot.fasta database of Homo sapiens protein sequence and processed to calculate the area for each protein using Proteome Discoverer (PD) software v2.1.1.21 (Thermo Scientific). For a more in-depth statistical analysis, the area data were then processed using Perseus v1.6.15.0 (freeware, Max Planck Institute of Biochemistry). Normalized and log2-transformed signal intensity values were used for subsequent analyses. Differentially expressed proteins (DEPs) were defined as proteins with a fold change > 1.5 between the TR and PR groups and a p < 0.05 (Student's t-test). Volcano plots, heatmaps, and partial least squares discriminant analysis (PLS-DA) were performed using pre-processed data with the online tools MetaboAnalyst 5.0 (https:// www. metab oanal yst. ca) and Hiplot (https:// hiplot. com. cn).

External validation
External validation was performed with the ROC plotter (www. rocpl ot. org) [15], an online database that links gene expression and response to therapy using transcriptome-level data of patients with RC. Within the database, a total of 56 patients with RC treated with radiotherapy and capecitabine were grouped into responders (n = 15) and non-responders (n = 41) according to the Response Evaluation Criteria in Solid Tumors (RECIST) criteria. The levels of probes corresponding to the DEPs of the internal discovery cohort in the responder and nonresponder groups were compared, and probes with significant difference between the two groups were further analyzed using the ROC curve so as to evaluate their predictive performance.

GSCALite
GSCALite (http:// bioin fo. life. hust. edu. cn/ web/ GSCAL ite/) [16] was used to analyze the relationship between genes and pathways by a line connection, as well as the correlation of gene expression and 5-FU sensitivity based on the Cancer Therapeutics Response Portal (CTRP) and Genomics of Drug Sensitivity in Cancer (GDSC) drug response datasets. The effects of genes on activation or  inhibition of cancer-related pathways in specific type of cancer were evaluated using reverse phase protein array (RPPA) data from The Cancer Proteome Atlas (TCPA).
The following pathways which are closely related to cancer therapy resistance were explored: epithelial mesenchymal transition (EMT), cell cycle, apoptosis pathways.

Bioinformatic analysis
To explore the possible underlying mechanism, functional gene set enrichment analysis (GSEA) was performed on the whole proteome data of the internal discovery cohort using Hallmark gene sets with default parameters. The GSEA results were filtered based on a nominal (NOM) p-value < 0.01 and the Normalized Enrichment Scores (NES) was used to identify topranked gene sets in TR versus PR. Then, for the DEPs, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment were performed using DAVID Bioinformatics Resources v6.8 (https:// david. ncifc rf. gov/ home. jsp) and visualized using Barplot (v0.1.1) in Hiplot. Pathways with p < 0.05 were considered significantly enriched. The protein-protein interaction (PPI) network was assessed using the STRING database (https:// string-db. org/) and visualized by Cytoscape v3.8.2 [17]. The DEGs were ranked using the 'cytoHubba' plugin of Cytoscape and displayed with color transition.

Statistical analysis
Statistical analysis was performed using Graphpad Prism v7.0 (GraphPad, San Diego, CA, USA). Statistical comparisons between the TR and PR groups were performed by Student t-test, Manny-Whitney U-test or chi-square test as appropriate. To investigate the linear relationship between each protein level and therapeutic responses, the p-value for trend was calculated by treating TR, MR, and PR groups as continuous variables. Correlation analysis between potential biomarkers and clinical features was performed using Spearman's coefficient of correlation. To evaluate the discrimination performance of the protein levels between different groups, the ROC curve analysis was conducted, and the true positive rate (TPR), true negative rate (TNR), area under the curve (AUC) were calculated. Statistical significance was set at p < 0.05.

Clinical and pathological features of patients with LARC
Among 392 patients with LARC treated with neoadjuvant therapy from the Chinese PLA General Hospital from 2010 to 2019, 42 patients were enrolled according to the criteria. Eventually, 37 patients were confirmed in the total internal cohort after excluding 5 samples with more than half missing values found in the process of protein identification and quantitation (Fig. 1). The cases in the total responders (TR), moderate responders (MR) and poor responders (PR) groups were 9 (24.32%), 17 (45.95%) and 11 (29.73%), respectively. The TR and PR groups were selected for further analysis of differentially expressed proteins (DEPs). Most patients in the TR and PR groups were aged over 53 years with mid to inferior rectal tumor (95%), non-mucinous adenocarcinoma (95%) disease. Clinical and pathological features of patients with LARC were detailed in Tables 1 and 2. There were no statistical differences between the TR and PR groups in age, gender, body mass index (BMI), chronic complications, tumor location and size, initial lateral lymph node dissemination, clinical TNM stage, nCRT time, or pathological related characteristics.
Results of hematology tests and serum tumor markers (CEA, CA 19-9, CA 125, and AFP) detected at the time of diagnosis before nCRT were also obtained, but no significant differences were observed between the TR and PR groups (Table 3).

Differentially expressed proteins between TR and PR group
In total, 3,998 proteins were identified through LC-MS/ MS-based proteomic analyses on FFPE biopsy samples from patients with LARC. As visualized in the volcano plot (Fig. 2a) and heatmap (Fig. 2c), 91 DEPs were found with statistical significance (p < 0.05) between the TR and PR group, including 45 up-and 46 downregulated proteins in the PR group compared with the TR group (Additional file 1: Table S1 and Additional file 2: Table S2). PLS-DA plot (Fig. 2b) shows that these DEPs levels were distinguished and clustered between the TR and PR groups.

Validation of potential biomarkers for response to nCRT
To assess the predictive value of the DEPs for nCRT response in patients with RC, an online ROC Plotter tool was utilized. From the ROC plotter, a total of 56 patients with RC receiving treatments of radiotherapy and capecitabine, as the external validation cohort, were divided into non-responders (n = 41) and responders (n = 15) according to the RECIST criteria. Based on the gene expression data of the external cohort, expression levels of heat shock proteins family A member 4 (HSPA4), nitrophenylphosphatase domain and non-neuronal SNAP25-like protein homolog 1 (NIPSNAP1) and spectrin beta, erythrocytic (SPTB) were validated to be statistically different between the responders and nonresponders (p = 0.026, p = 0.009, p = 0.013, respectively) with higher expression of HSPA4 and lower expressions of NIPSNAP1 and SPTB in the responders (Fig. 3a-c), which were in accordance with our findings from the internal discovery cohort. Fig. 1 Workflow of the total internal cohort in Chinese PLA General Hospital from 2010 to 2019. A total of 42 patients with LARC met the enrollment criteria and were divided into 3 groups based on TRG, namely total responders (TR), moderate responders (MR), and poor responders (PR). Samples from 37 patients passing quality control in protein identification were finally included in the discovery cohort for further data analysis. LARC, locally advanced rectal cancer; nCRT, neoadjuvant chemoradiation therapy; TME, total mesorectal excision; TRG, tumor regression grade Then, we further explored the variation tendencies of HSPA4, NIPSNAP1 and SPTB among different response groups in the total internal cohort, and we had confirmed the dose-dependent relationship between each of the three proteins and therapy response grades. Significant linear trends were found across the TR, MR and PR groups in terms of HSPA4 (p = 0.0494, slope = 0.3837) and SPTB (p = 0.0397, slope = 0.8759) (Fig. 3d, f ), indicating that the levels of HSPA4 and SPTB varied with the extent of nCRT response. Although not statistically different, a seemingly linear trend in the NIPSNAP1 level among these response groups was observed (p = 0.1459, slope = 0.7497) (Fig. 3e).

The relationship between the clinical features and the expressions of HSPA4, NIPSNAP1, and SPTB
The correlation analysis were performed in the total internal cohort to explore the relationship of the disease stage, as well as smoking status, with the expressions of HSPA4, NIPSNAP1, and SPTB. No significant correlation was found between cTNM staging and the expres-

Evaluation of HSPA4, NIPSNAP1, and SPTB as predictive markers for nCRT response
As significant linear trends were discovered in the levels of HSPA4 and SPTB across various responses to nCRT in patients with RC, the predictive performance of these potential markers in the TR and PR groups was assessed using ROC curve analysis. Detailed results are listed in Table 4. The three proteins had individual AUCs of ~ 0.8 in the internal discovery cohort (Fig. 4a-c) and ~ 0.7 in the external validation cohort from the ROC plotter database (Fig. 4d-f ). Intriguingly, the AUC of HSPA4 combined with SPTB dramatically increased to 0.980 (95% CI, 0.797-1.000, p < 0.0001) in the discovery cohort, with sensitivity and specificity of 90.91 and 100.00%, respectively (Fig. 4g). In the external validation cohort, the combination of HSAP4 and SPTB mRNA achieved an AUC of 0.741 (95% CI 0.607-0.849, p = 0.0048), with sensitivity and specificity of 73.33 and 78.05%, respectively (Fig. 4h).
Next, the MR group was introduced into the TR and PR groups, respectively, for ROC analysis to observe the predictive performance of HSPA4, NIPSNAP1, and SPTB, along with their combinations. As shown in Table 5, for TR versus MR & PR, the highest TPR reached up to 0.79 with AUC = 0.849 (p < 0.0001) when HSPA4 and SPTB were combined (Combination 2), which meant Combination 2 had the best identification ability for the total responders. For TR&MR versus PR results, the highest TNR reached 0.97 combining HSPA4, NIPSNAP1, and SPTB (Combination 4, AUC = 0.794, p = 0.0007), meaning that Combination 4 best identifid poor responders compared with others.

Correlation of the DEPs with drug sensitivity database
Based on the drug sensitivity data of the cancer cell lines in the CTRP and GDSC databases, the corresponding genes of the DEPs were validated for their correlation with 5-FU sensitivity in RC using the online GSCALite website tool. The results revealed that RASAL3, PTPN6, SYNCRIP, ARGLU1, SIN3A and DDX21 had higher expression in total responders which negatively correlated to 5-FU resistance (Fig. 5a), while REXO2, LAMC1, TPM1 and S100A6 had lower expression in the total responders showing positive correlation with 5-FU resistance (Fig. 5b).

Effects of the DEPs on oncogenic pathways in RC
To further understand the molecular mechanisms for the DEPs involved in tumorigenesis of RC, the GSCALite tool was used to examine the correlation between gene expression levels and regulation of three key signaling pathways which were closely related to therapeutic response, according to a pathway score calculated by GSCALite. Our results showed that the DEPs were highly associated with the activation or inhibition of apoptosis, cell cycle, and EMT pathways (Fig. 5c). Concerning cell cycle pathways, HSPA4, RUVBL2, NOP58, CLUH, HDAC2, GANAB, ETF1, and DNAJC7 were higher in the TR group and closely related to the activation of the cell cycle, while TPM1, GAA, and MAOA were found higher in the PR group and associated with the inhibition of the cell cycle. In addition, the apoptosis pathway could be activated by PSMB7 and inhibited by GAA. Lastly, POSTN and COL3A1 had higher levels in the PR group involved in activation of the EMT pathway.

Functional enrichment and PPI networks
Through the GSEA results, proteomics revealed that the TR group was enriched in gene sets related to cell cycle such as the "E2F targets" (NES = 2.02, p < 0.001) and "G2M checkpoint" (NES = 1.55, p = 0.007), while the PR group was enriched in EMT (NES = 1.91, p < 0.001) (Fig. 6a-c). Then, GO and KEGG enrichment analysis gave further insight into the biological functions of the DEPs. The top five terms of GO and KEGG annotations showed the DEPs were mainly associated with processes Fig. 4 ROC curves of HSPA4, NIPSNAP1, SPTB, as well as the combination of HSPA4 and SPTB. ROC curves of HSPA4, NIPSNAP1, SPTB, as well as the combination of HSPA4 and SPTB for predictive performance in the internal discovery cohort (n = 20; total responders = 9, poor responders = 11) (a-c, g) and the external validation cohort (n = 56, responders = 15, non-responders = 41) (d-f, h). ROC, receiver operating characteristic;. AUC, area under the curve; TPR, true positive rate; TNR, true negative rate including rRNA processing, Box C/D snoRNP complex, ATPase binding and ribosome biogenesis in eukaryotes ( Fig. 6d-g). As shown in the PPI network (Fig. 6h), 14 proteins were involved in RNA binding or processing processes, of which 12 proteins were higher in the TR group. The 10 most connected nodes ranked (in descending order) by cytoHubba were XPO1, COPG1, SPTB, SF3B3, HBB, SYNCRIP, NUP153, FBL, NOP58, and HSPA4.

Discussion
nCRT before TME is the standard treatment for patients with LARC, unfortunately, various patients could not achieve favorable responses to nCRT. To avoid futile treatments for non-responders, effective prediction of nCRT response would guide treatment decisions for patients with LARC, while also increasing the opportunity for responders to receive nCRT and perform organconserving surgery. Various studies have been conducted to identify predictive markers for response to nCRT in patients with LARC using proteomic methods. However, to the best of our knowledge, the present study was the first to identify proteins that could discriminate between good and poor responders in Chinese patients with LARC.
Through proteomic approach, 91 DEPs were screened from different nCRT response groups in FFPE biopsy tissues of patients with LARC to successfully discriminate total and poor responders by PLS-DA analysis. Further verification demonstrated the promising discrimination ability of HSPA4, NIPSNAP1, and SPTB for different nCRT responses in both internal and external cohorts. Significant linear trends found in HSPA4 and SPTB among the TR, MR, and PR groups revealed changing levels to therapeutic responses, which implied their potential predictive value for nCRT response.
Subsequently, we utilized ROC analysis to evaluate the predictive performance of the three proteins. Though each of them showed moderate performance in both cohorts, the AUCs of their various combinations distinctly improved discrimination between the total and poor responders, especially the combination of HSPA4 and SPTB whose AUC surprisingly reached 0.98 in our cohort. In clinical applications, a greater concern is placed on one marker that could accurately discriminate between total responders and poor responders in patients with LARC. Therefore, the MR group was further introduced into the ROC analysis to evaluate the predictive value of the three proteins and their different combinations for the entire treatment population. Similarly, the predictive performance of individual proteins and different combinations varied. For TR versus (MR & PR), the combination with the highest sensitivity (HSPA4 & SPTB) was selected so that more patients with total response would be identified and benefit from nCRT. For TR & MR versus PR, the combination with the largest specificity (HSPA4 & NIPSNAP1 & SPTB) was chosen so that patients with poor responses could be mostly identified for personalized their treatment regimens. Therefore, based on the actual requirement, optional combinations of these proteins were determined to improve the discrimination between total response and poor response to nCRT, which provided better guidance on clinical decisions. In summary, our results indicated that the predictive value of multiple combinations of HSPA4, NIPSNAP1, and SPTB were comparable or even superior to predictive markers reported by other studies [18][19][20][21], suggesting that they might be reliable predictive markers for nCRT response in patients with LARC.
To explore if the DEPs played a role in 5-FU sensitivity, we analyzed the drug response data from the GDSC and CTRP databases along with related published literatures. Relationship between genes/proteins and drug sensitivity/pathway activities in rectal cancer. a, b 5-FU resistance analysis of the differentially expressed proteins (DEPs) corresponding to genes based on GDSC/CTRP IC 50 drug data. Gene expression correlation with the drug sensitivity was determined using Spearman correlation. A positive correlation means the genes with high expression are resistant to the drug (fluorouracil or 5-fluorouracil), and vise verse. c The role of the DEPs in cancer related-pathways (using GSCALite). Reverse-phase protein array data from The Cancer Proteome Atlas are analyzed to calculate the correlation of genes with cancer-related pathways in rectal cancer. Genes corresponding to the DEPs with activation/inhibition effect on apoptosis, cell cycle and EMT pathways are shown. TR, total responders; PR, poor responders; EMT, epithelial mesenchymal transition; READ, rectal adenocarcinoma Only parts of the DEPs were verified to be associated with the sensitivity or resistance to 5-FU, whereas HSPA4, NIPSNAP1, and SPTB appear to not affect treatment response through 5-FU sensitization. For a better understanding of the roles of the DEPs and the molecular mechanisms underlying the different therapy responses, we next explored the DEP-related processes and pathways. From the GSEA results and the roles of DEPs in cancer related-pathways, we found proteomic differences between the total and poor responders involved in the regulation of the cell cycle and EMT pathways in RC, which were closely related to chemo-and radiotherapy resistance in cancer [22]. DNA damage is one of the major processes induced by both radiation and chemotherapy. In our results, the DEPs were significantly enriched in procedures including RNA binding and processing, and post-translational protein modification. It is known that radiation could cause greater post-translational protein modification with activated intracellular signaling pathways, thus leading to DNA damage. RNA processing might also be associated with the activation of DNA damage response, which in turn or in parallel triggered nucleolar and ribosomal stress [23]. Thus, the DEPs implicated in these processes might affect treatment efficacy via DNA damage and DNA damage repair pathways. Based on our results, we found that HSPA4 is involved in cell cycle activation in RC, while HSPA4 and SPTB were identified as two of the top 10 most connected nodes ranked by cytoHubba. Regrettably, we could not elucidate the machanism of NIPSNAP1 and SPTB, and the roles and underlying mechanism of all three proteins in nCRT response remained unclear.
Heat Shock Proteins (HSPs) are a large family of evolutionally conserved and ubiquitously expressed molecular chaperones. Aberrant levels of HSPs were validated in multiple cancer types and had been shown to be involved in the regulation of malignant progression including apoptosis, proliferation, angiogenesis, metastasis and immune responses [24][25][26][27][28][29]. HSPA4 belongs to the HSP110 family and is related to tumor progression and outcomes [30][31][32][33][34]. Futhermore, the levels of multiple HSPs including HSPA4 were significantly enhanced in hepatocellular carcinoma (HCC), CRC and head and neck cancer compared with normal tissues. In vitro experiments showed that HSPA4 modulate the proliferation and migration of HCC and CRC cells, which presumably underpins the prognostic implication of HSPA4 in CRC [32]. In CRC cell lines, HSPA4 knockdown suppressed cell proliferation and migration, and caused arrest in the G2-phase of the cell cycle along with increased levels of apoptosis by inhibiting the activation of the PI3K/Akt signaling pathway and reducing the cell cycle progression markers CCND1 and CDK6 [32]. In our study, HSPA4 was a hub gene and highly expressed in the TR group compared to the PR group, which was in line with the results of the external mRNA datasets. Using GSCALite, we found that HSPA4 was capable of activating the cell cycle in RC, in accordance with a previous CRC cell experiment [32]. Therefore, we assumed that HSPA4 increased the treatment sensitivity by promoting cell-cycle transitions of tumor cells, resulting in superior response to nCRT in the TR group with higher HSPA4 levels.
NIPSNAP1 a kind of mitochondrial matrix protein, could recruit autophagy receptors and play a role in PARKIN-dependent mitophagy in damaged mitochondria [35]. NIPSNAP1 deficiency was found to be related to the accumulation of reactive oxygen species (ROS) and mitophagy deficiency in the brain of zebrafish larvae [36]. Since ROS generated in mitochondria was a strong inducer of apoptosis during chemo-or radiotherapy, mitophagy could degrade dysfunctional mitochondria to decrease ROS production, which promoted cell survival and resisted apoptosis during chemotherapy in various chemo-resistant cancer cells [37][38][39][40][41]. In CRC, it had been reported that mitophagy contributed to doxorubicin resistance in HCT8 cancer stem cells, and inhibition of mitophagy enhanced doxorubicin sensitivity [42]. Thus, reversing mitophagy-mediated protective mechanisms might be one of many ways to reverse chemotherapy resistance. In this study, the NIPSNAP1 level was observed to be higher in the PR than in the TR group. Due to limited literature reports, few interactions of NIP-SNAP1 with other DEPs were found in our results. However, it could be speculated that, (1) since mitophagy had been linked to drug resistance in CRC and (2) NIPSNAP1 had been shown to affect mitophagy and ROS production in other types of cancer, that mitophagy impairment and elevated ROS production might exist in the TR group due to the lower level of NIPSNAP1, which partially explains (See figure on next page.) Fig. 6 Functional enrichment and protein-protein interaction (PPI). Gene set enrichment analysis (GSEA) results of the whole proteomic data. HALLMARK gene set terms "E2F_TARGETS" (a) and "G2/M_CHECKPOINT" (b) significantly (nominal p < 0.01) upregulated (red) and "EPITHELIAL_ MESENCHYMAL_TRANSITION" (c) downregulated (blue) in the TR versus PR group. The top five Gene Ontology (GO) analysis terms in the biological process (d), cellular component (e), and molecular function (f) categories of differentially expressed proteins between the TR and PR group. g The differentially expressed proteins were mapped to canonical pathways using the Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment. h The PPI network was assessed using the STRING database and visualized by Cytoscape. Color transition of the nodes is based on the ranking calculated by CytoHubba. NES, Normalized Enrichment Scores; TR, total responders; PR, poor responders their greater sensitivity to nCRT. SPTB is involved in erythrocyte membrane stability, and the mutations in this gene have been implicated in spherocytosis type 2 and hereditary elliptocytosis [43][44][45]. However, few studies have reported on the association of between SPTB and solid tumors.
To there, the exploration of molecular biomarkers in response to nCRT in RC mainly focused on proteomics, transcriptomics, DNA mutation and methylation, and single nucleotide polymorphisms (SNPs) [8]. We selected proteomics rather than transcriptomics to search biomarkers in the study, as the changes in protein could reflect the response to nCRT more directly compared with mRNA. Meta-analysis suggested that variation in mRNA levels is often a poor predictor of changes in protein abundance [46]. In addition, according to another proteomic study on CRC cell lines, proteomic data might provide better prediction of drug sensitivity in CRC when compared to genomic and transcriptomic profiles in informing personalized cancer treatment [47]. Although the mRNA level is not always consistent with protein abundance-probably due to the impact on the binding, processing or translation of mRNA, it was found that genes with stable mRNA and protein expression tend to have higher mRNA-protein correlation [48]. The three proteins in this study were identified and validated at both the protein and mRNA levels, indicating that they might be stable markers in patients with LARC. With respect to the interference of endogenous expression, consistent expression differences of the markers were found at protein and mRNA levels in the two distinct cohorts, and significant linear trends of identified markers were found across different treatment responses in the total internal cohort, suggesting the inference of endogenous expressions might be few. Additionally, we searched on public data resources and literature, and no report on endogenous expression of the three markers had been found in rectal cancer patients. Based on these findings, although we could not completely exclude the interference from endogenous expression, we speculated that endogenous expressions of the identified markers in specific patients occurred with a low probability. After validation in larger independent cohorts, protein markers could be applied to clinical use more promptly through mature protein detection technology (e.g., immunohistochemistry for biopsy samples) for the prediction of response to nCRT-which is more cost-effective and simpler to implement.
In addition to tissue-based proteomics, blood-based proteomics is also a common approach to research the response to nCRT in RC [8]. However, the results of blood-derived proteomics are not always satisfying. Proteins from tumor tissues are most likely subjected to interference when released into the blood, combined with multiple variables introduced by the MS procedure and sample processing-confounding differential proteins identification. We dissected tissue-based proteomics in search of protein markers to predict the response to nCRT in RC for two reasons: (1) tissue-based proteomics data directly reveal the differences in the tumor tissues themself; and (2) the differential proteins identified in tumor tissues could be verified using plasma samples from patients with LARC in a future study. Nevertheless, other similar studies had smaller sample sizes, inconsistent TRG grading, and different thresholds for screening differential proteins. Until now, few overlapping proteins have been found between studies, and most lack verification [49]. In the case of different evaluation systems and data types, we proved that specific proteins (HSPA4, NIPSNAP1, and SPTB) and their combinations had a favorable performance as predictive markers for nCRT response. The combination of the two markers, HSPA4 and SPTB, could achieve relatively higher AUCs in the internal discovery cohort (AUC = 0.980) and external validation cohort (AUC = 0.741), as well as in the total internal discovery cohort (TR versus MR & PR: AUC = 0.849, TR & MR versus PR: AUC = 0.836), with the TNR and TPR almost all above 0.7. Simultaneously, the combination of HSPA4 and SPTB also demonstrated good TPR (0.91) when comparing the good and moderate response group with the poor response group-potentially screening patients with poor benefit to nCRT to a greater extent and avoiding unnecessary pre-operative treatment. Next, for screening out patients who might achieve total response to nCRT more sensitively, the combination of NIPSNAP1 and SPTB could reach a TNR of 1.00 when comparing patients with total response to others. This might help more patients receive nCRT and get surgery opportunities with organ preservation. Additionally, the levels of the three proteins linearly varied with the degrees of therapy response. Thus, the three proteins, including their optional combinations, were considered viable biomarkers for the prediction of nCRT response and deserve further investigation for their predictive value. Furthermore, based on our findings, their underlying mechanisms and potential as targets for treatment sensitization should also be elucidated in future research.
This was a single-center retrospective study with limited cases. Due to the lack of definite diagnosis and unclear therapeutic efficacy, multiple patients were sufferering from LARC. Few rectal patients could receive an accurate diagnosis of LARC and received nCRT regardless. Therefore, it was crucial to shed light on the prediction of nCRT response for treatment decisions. Our research institution is part of a national hospital with patients from all over the country, which helped to offset the single population-although undeniably still a single center. As part of future studies, we continue to collect samples to validate biomarker feasibility in future multi-center studies with expanded sample sizes.

Conclusions
The results of the present study illustrate the difference in tissue proteomics between patients with LARC based on their response to nCRT. These are preliminary results based on a small cohort, however, the results suggest that the tumor tissue-derived proteins (HSPA4, SPTB and NIPSNAP1) and a combination of HSPA4 and SPTB, could achieve relatively high predictive ability for nCRT sensitivity. Therefore, these proteins and their optional combinations might be potential biomarkers to predict nCRT response in patients with LARC. Furthermore, the combination of HSPA4 and SPTB also demonstrated good TPR (0.91) between the poor response group and the others, which could be useful to screen patients with a lower likelihood to benefit from nCRT and avoid unnecessary pre-operative treatment. Moreover, the combination of NIPSNAP1 and SPTB could reach a TNR of 1.00 for patients with total response, which might help identify patients likely to achieve total response to nCRT. Our study provides insight into potential biomarker identification and pathways to determine nCRT response sensitivity in a Chinese cohort of patients with LARC.