A Genomic-Clinicopathologic Nomogram Predicts Survival for Patients with Laryngeal Squamous Cell Carcinoma

Background Long noncoding RNAs (lncRNAs), which have little or no ability to encode proteins, have attracted special attention due to their potential role in cancer disease. We aimed to establish a lncRNA signature and a nomogram incorporating the genomic and clinicopathologic factors to improve the accuracy of survival prediction for laryngeal squamous cell carcinoma (LSCC). Methods A LSCC RNA-sequencing (RNA-seq) dataset and the matched clinicopathologic information were downloaded from The Cancer Genome Atlas (TCGA). Using univariable Cox regression and least absolute shrinkage and selection operator (LASSO) analysis, we developed a thirteen-lncRNA signature related to prognosis. On the basis of multivariable Cox regression analysis results, a nomogram integrating the genomic and clinicopathologic predictors was built. The predictive accuracy and discriminative ability of the inclusive nomogram were confirmed by calibration curve and a concordance index (C-index), and compared with the TNM staging system by C-index and receiver operating characteristic (ROC) analysis. Decision curve analysis (DCA) was conducted to evaluate the clinical value of our nomogram. Results Thirteen overall survival- (OS-) related lncRNAs were identified, and the signature consisting of the selected thirteen lncRNAs could effectively divide patients into high-risk and low-risk subgroups, with area under curves (AUC) of 0.89 (3-year OS) and 0.885 (5-year OS). Independent factors derived from multivariable analysis to predict survival were margin status, tumor status, and lncRNA signature, which were all assembled into the nomogram. The calibration curve for the survival probability showed that the predictions based on the nomogram coincided well with actual observations. The C-index of the nomogram was 0.82 (0.77-0.87), and the area under curve (AUC) of the nomogram in predicting overall survival (OS) was 0.938, both of which were significantly higher than the traditional TNM stage. Decision curve analysis further demonstrated that our nomogram had larger net benefit than TNM stage. Conclusion An inclusive nomogram for patients with LSCC, comprising genomic and clinicopathologic variables, generates more accurate estimations of the survival probability when compared with TNM stage alone, but more data are needed before the nomogram is used in clinical practice.

variations [7]. As a result, a signi cant proportion of patients with inaccurate stage may receive overtreatment or under-treatment. For instance, overstage might subject a patient to needless adjuvant chemoradiotherapy; conversely, understage is likely to result in recurrence or even death after surgery.
Hence, identifying reliable and novel markers/models to improve accuracy of prediction in LSCC patients is very urgent and necessary to optimize the treatment planning and bene t patients.
As previous genome researches revealed, more than ninety percent of the human genome is actively transcribed into non-coding RNA (ncRNAs) [8]. Conventionally, ncRNA family is loosely classi ed into two categories based on molecular size: small non-coding RNA (the length less than 200nt; eg microRNA) and long ncRNA (the length more than 200nt; lncRNA) [9]. Unlike protein-coding RNAs, the expression patterns of the lncRNAs are more speci c. A large number of researches have reported the diverse biological functions of lncRNAs, such as tumorigenesis, tumor progression, as well as metastasis [10]. LncRNA can be as a new cancer biomarker, which represents a large number of potential molecular drivers in human cancer disease [11]. In the past several years, lncRNAs signatures have been reported to evaluate prognostic of cancers, including head and neck cancer, cervical carcinoma and gastric cancer [12][13][14].But, the lncRNA signature that is applied to predict the overall survival (OS) outcome of LSCC has not been found yet.
In present study, we hypothesized that inclusive nomogram containing genomic and clinicopathological factors can improve the prediction accuracy of survival probability. By mining the expression data of lncRNAs in The Cancer Genome Atlas (TCGA), we appraised lncRNAs that were signi cantly related to survival outcomes, and then developed a lncRNAs signature. An inclusive nomogram for predicting survival status was established by further integrating lncRNAs signature with clinicopathological factors.
We assessed the predictive ability and clinical application of the nomogram and compared it to the TNM stage. In addition, we evaluated the prediction effect of the nomogram in clinical subgroups (advanced LSCC and early LSCC).

Methods
Collection of public data from TCGA LSCC RNA sequencing (RNA-seq) data set and relevant clinicopathological information including the age, sex, smoke history, alcohol history, number of lymph nodes (LN), number of positive LNs, lymph node ratio(LNR), margin status, tumor status, histologic grade, T stage, N status, TNM stage, mutation count, fraction genome altered, and overall survival(OS) time were obtained from TCGA database(https://gdc.cancer.gov/). A total of 109 patients with complete follow-up data were extracted, which recorded before April 14, 2019. The clinical end point was overall survival (OS), de ned as the time from surgery to death. In addition, patients who were alive are considered as censored cases at last follow-up.
Given that the expression level of lncRNAs is relatively low compared with non-coding RNA, it is likely that some lncRNAs have not been analyzed during the sequencing procedure of lncRNAs. Considering this possibility, we de ned lncRNAs as being expressed abundantly when its expression level is above 0 and occurs more than 50% in the total samples. The nal expression level of lncRNAs was represented as log 2 (x + 1) of the original expression level. Construction and con rmation of a lncRNAs signature First, moderated t-statistics method and Benjamini-Hochberg procedure are used to identify distinct differential lncRNAs between normal tissues and LSCC tissues, with the false discovery rate (FDR) <0.05 and P<0.05 for ltration. Next, univariable Cox regression analysis is applied to pick out prognostic related lncRNAs with statistically signi cant (P<0.01). After primary ltering, a Least Absolute Shrinkage and Selector Operation (LASSO) analysis is established to select candidate lncRNAs with penalty parameters tuning adjusted by 10 times cross validation [15], then a signature based on these wellselected lncRNAs is developed.
The risk score formula is generated by integrating these prognostic related lncRNA, weighted by their respective LASSO regression coe cients. According to this formula, each patient's risk score was calculated, and patients were classi ed into high-risk or low-risk group on the basis of the optimal cut-off value, which was adopted in the maximum sensitivity and speci city by using receiver operating characteristic (ROC) curve (time-independent). The survival difference between high-risk group and lowrisk group are further compared by the Kaplan-Meier analysis (log-rank test). Strati ed analysis based on various clinical characteristics is conducted to evaluate the discrimination ability of lncRNAs signature.

Function enrichment analysis of the prognostic lncRNAs
In TCGA dataset, according to their expression level, Pearson correlation algorithm is performed between the identi ed lncRNAs and the protein-coding genes (mRNAs). The P<0.001 and correlation coe cient >0.4 are de ned signi cant correlation. The potential biological functions of the lncRNAs target genes are investigated by using Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG).
DAVID is a common bioinformatics tool (http://david.abcc.ncifcrf.gov/, version 6.8) [16], which is applied to investigate the biological processes of the selected lncRNAs. Adjusted P values with a FDR < 0.05 for GO analysis and KEGG pathways are considered remarkably enriched functional annotations.

Genomic-clinicopathologic nomogram
To establish a genomic-clinicopathologic nomogram, we carried out univariate and multivariate Cox regression analysis to appraised clinical risk parameters associated with survival. Then, the lncRNAsbased signature, together with the risk parameters, were applied to develop a comprehensive nomogram.
The performance of the model was evaluated by calibration and discrimination. Discrimination is the models ability to differentiate between patients who survived versus those who did not. The concordance index(C-index) was computed to assess the discrimination. Besides, we illustrated the discrimination by dividing the data set into three groups according to the scores generated by the nomogram. We plotted a Kaplan-Meier curve for all three groups. In additional, calibration curves with plotting the nomogram predicted probabilities against the observed outcomes were graphically evaluated.
Furthermore, we used ROC analysis to investigate and compare the discrimination ability of the nomogram with TNM stage or lncRNAs signature. Decision curve analysis (DCA) was used to assess the clinical usability and net bene t of the predictive model, and compared with traditional TNM staging or lncRNAs signature [17]. Finally, we evaluated the predictive accuracy of the comprehensive nomogram in clinical subgroups (advanced LSCC and early LSCC).

Statistical analysis
Categorical variables are provided as proportions (%). Continuous variables are described as medians (interquartile ranges [IQRs]) if the distribution was non-normal, and as means (standard deviations [SDs]) if the distribution was normal.
If there were missed values in some of the potential predictors, these missing data would be imputed, as complete case analysis would improve the statistical power and reduce potentially biassed result [18].
Multiple imputation (MI) was conducted to interpolate the missing data as the missing data were deemed to randomly miss after analyzing patterns of them [19]. We used Markov chain Monte Carlo (MCMC) function to perform MI, and selected ve iterations to account for possible simulation errors.
LASSO algorithm was conducted with "glmnet" packages, and ROC analysis was generated with "timeROC" and "survivalROC" packages. The nomogram and calibration plots were done with "rms" packages, and DCA was performed with the "stdca.R". SPSS statistics 22.0 and R software (R version 3.5.2) were used to conduct the statistical analysis. A P<0.05 with two sided would be recognized as statistically signi cant.

Demographic parameters and OS outcome of LSCC patients
In the current study, 109 LSCC patients with available lncRNAs data and clinicopathological characteristics were included. The basic clinicopathological features of these LSCC patients were summarized in Table 1 Using ROC curve to generate the optimal cutoff value for the risk score, patients were categorized into high-risk group and low-risk group. As w-as shown at Figure 2, patients with high-risk score were more likely to die and had shorter OS time than patients with low risk score (19.74 vs 108.9 months, HR=5.79, 95%CI: 3.18-10.54, P<0.0001). The lncRNAs signature had a superior prediction effect, with AUC of 0.89 (3 year OS) and AUC of 0.885(5 year OS) ( Figure 2C). Additionally, 13 lncRNAs signature in subsets of patients with different clinical characteristics were analyzed by strati cation analysis. When strati ed according to clinical variables (tumor size, node status, TNM stage), 13 lncRNAs signature remained a clinically and statistically signi cant prognostic model ( Figure S1).

Functional prediction of the 13 lncRNAs
To explore the potential function of the 13 lncRNAs, a total of 237 protein-coding genes (mRNAs) were identi ed signi cantly correlated with at least one of the 13 lncRNAs (P < 0.001 and Pearson coe cient > 0.4), which were deem to eligible for pathway enrichment. The 13 lncRNAs were mainly related with human papillomavirus infection, focal adhesion and protein digestion and absorption ( Figure 3A). And KEGG pathway analysis revealed that 13 lncRNAs related target genes mainly enriched in metalloendopeptidase activity, extracellular matrix structural constituent and metallopeptidase activity ( Figure 3B).

Development of genomic-clinicopathologic nomogram predicting OS in LSCC patients
Using univariate Cox analysis, we identi ed four variables, including sex, margin status, tumor status and lncRNAs signature, were associated with survival probability ( Table 2). Multivariable analysis continued to verify that margin status, tumor status and lncRNAs signature, were independent risk factors for OS. Based on multivariate analysis of OS, a genomic-clinicopathological nomogram was built to predict OS in 3 and 5 years (Figure 4). The C-index of the nomogram for OS prediction was 0.82 (0.77-0.87) ( Table   3).The calibration plot of OS probabilities for 3 and 5 years showed the best consistency between the nomogram prediction and the actual observations ( Figure S2).Additionally, Kaplan-Meier curve was performed to analyze the discrimination ability of the nomogram to predict OS, and a signi cant statistical difference was found among the three subgroups (Log-rank P<0.0001) ( Figure S3) Comparison of predictive performance and clinical usefulness between nomogram and TNM stage or lncRNAs signature To evaluate the predictive ability of nomogram, we compared nomogram model with AJCC TNM stage model and lncRNAs signature. As was shown at Table 3, the C-index of nomogram was higher than that of TNM stage (0.53 (0.45-0.61)) and lncRNAs signature (0.78 (0.71-0.85)). Likelihood ratio test, linear trend χ2 test and akaike information criterion all showed that the nomogram was better than TNM stage or lncRNAs signature. ROC analysis also indicated that the nomogram (AUC 0.938) had higher prediction e ciency than TNM stage (AUC 0.533) or lncRNAs signature (AUC 0.847) ( Figure 5A) .Finally, DCA was conducted to compare the clinical usefulness of the nomogram to that of traditional TNM stage and lncRNAs signature. According to the continuity of potential death threshold (x-axis) and the net bene t of risk strati cation using the model (y-axis), DCA visually revealed that the inclusive nomogram was superior to the traditional TNM staging or lncRNAs signature ( Figure 5B).
Furthermore, ROC analysis in clinical subgroups (advanced LSCC and early LSCC) was conducted to assess the discrimination ability of the nomogram. As shown in Figure 6, encouragingly, the nomogram presented good discrimination ability in the advanced LSCC subgroup (AUC 0.951; Figure 6A) and the early LSCC subgroup (AUC 0.811; Figure 6B). Moreover, according to the best cutoff values, the patients in each subgroup were classi ed into low-risk group and high-risk group. Notably, the low-risk group was more likely to survive in two subgroups ( Figure 7A and Figure 7B).

Discussion
Analyzing LSCC RNA sequencing (RNA-seq) data set and relevant clinical parameter of 109 LSCC patients from TCGA, we identi ed thirteen lncRNAs related to OS. On the basis of these lncRNAs, we developed a lncRNAs signature, which could accurately categorized patients into high-risk status and lowrisk status. Additionally, we built a visually inclusive nomogram, integrating lncRNAs signature and clinicopathologic variables to predict survival in LSCC patients underwent surgery resection. The nomogram effectively predicted survival rate, with a bootstrapped corrected C-index of 0.73 and AUC of 0.938, which possessed better predictive ability and clinical usability than TNM stage alone.
Increasing number of studies have found that lncRNAs may be exploited as potential effective biomarkers in diagnosis, progression and prognosis of LSCC [20][21][22]. Basing a comprehensive lncRNAs pro le for LSCC, Shen et al [20] identi ed AC026166.2-001 and RP11-169D4.1-001 as new lncRNAs with accurate diagnosis ability for LSCC, were independent factors for prognosis and may be potential therapeutic targets. A study of lncRNAs microarray by Chen et al. [21] uncoverd that lncRNA AC 008440.10 was signi cantly related to LSCC stage, lymph node metastasis (LNM) and survival time.
Recently, Zhao et al. [22] con rmed that LINC00668, was up-regulated in LSCC, probably in vitro promote the malignant phenotypes of cells and the author deduced that LINC00668 may enhance the stability of RAB3B mRNA by binding its 3'UTR. Notably, He et al. [23] collected data from the open Gene Expression Omnibus(GEO), reported that 18-mRNA and one-lncRNA module were correlated with disease-free survival (DFS) of LSCC patients and it effectively divided patients into high-risk group and low-risk group with different DFS outcomes, independent of patient age and tumor grade. Similarly, Wu et al. [24], using data from GEO, constructed a potential panel of two-lncRNAs signature, including RP11-169K16.4 and RP11-107E5.3, to predict recurrence of patients with laryngeal carcinoma and con rmed that it was independent predictors of laryngeal cancer patients. These studies suggested the potential clinical implications of lncRNA in improving the prognosis prediction of LSCC. However, it should be noted that the lncRNAs signature predicting the overall survival (OS) outcome of LSCC has not been reported yet.
Hence, in the current study, using TCGA database containing large-scale lncRNAs expression data, we aimed to identify OS-related lncRNAs and establish a lncRNAs signature, which may be more valuable for LSCC patients to optimize tailored treatment in the era of precision medicine.
To our knowledge, this is the rst study constructed an inclusive nomogram, combining lncRNAs signature and clinicopathologic factors, for predicting survival probability in patients with LSCC. We built a lncRNAs signature, consist of AC007907.1, AC025419.1, AC078993.1, AC090241.2, AL158166.1, AL355974.2, AL596330.1, HOXB-AS4, KLHL6-AS1, LHX1-DT, LINC00528, LINC01436 and TTTY14, could effectively classi ed patients into high-risk group with shorter OS and low-risk group with longer OS. Using strati ed analysis, lncRNAs signature shown perfect discrimination ability, regardless of tumor size, node status, and TNM stage. Additionally, we identi ed three independent predictors that margin status, tumor status and lncRNAs signature, which were all embedded into the nomogram. In this study, in consideration of homogeneity, and ability of discrimination and risk strati cation of the model, the performance of the nomogram in predicting survival probability is superior to the TNM staging system. The advantage of the current nomogram is that it integrated genomic and clinicopathological variables, which are important for predicting survival risk, but cannot be obtained by TNM stage system. Remarkably, DCA results showed that LSCC survival-related treatment decision based on the nomogram led to more net bene t than treatment decision based on TNM stage or lncRNAs signature, or treating either all patients or none. Taken together, the present nomogram would be clinically useful for the clinicians in tailoring survival-associated treatment decision.
It is also worth to mention that an important feature of our comprehensive nomogram may be the ability to stratify clinical subgroups, including early LSCC and advanced LSCC. Patients diagnosed with early LSCC are generally considered to have a low survival risk, and therefore do not receive adjuvant treatment after radical resection. Nevertheless, some patients in the clinically low-risk subgroup (early LSCC) are at high risk of survival, and they are likely to bene t from adjuvant treatment or intensive follow-up plan. Likewise, Patients diagnosed with advanced LSCC are usually identi ed as high risk survival status and need to receive adjuvant therapy underwent laryngectomy. Nevertheless, several patients in the clinically high-risk subgroup (advanced LSCC) are at low risk of survival, and they may not bene t from adjuvant therapy or intensive follow-up plan. It is an arduous challenge to accurately predict survival risk of patients. Encouragingly, our nomogram presented perfect discrimination capacity in early LSCC and advanced LSCC subgroups. Hence, our nomogram probably bene t a large proportion of the patients who might be considered at high risk of survival in early LSCC subgroup or might be considered at low risk of survival in advanced LSCC subgroup.
Consistent with previous studies, margin status was found to be signi cant association with survival among patients with LSCC in the present study [29,30]. Published trials about LSCC did not reported that tumor status which was an independent risk factor for OS in our study was related to prognosis of LSCC. However, tumor status has been con rmed that it was independent prognostic factors for survival in hepatocellular carcinoma (HCC) [31]. Additionally, we identi ed male was positively associated with OS probability in the univariate Cox analysis, inconsistent with previous trials that male was poor prognosis for LSCC [32,33]. Nevertheless, the effect of gender on the prognosis of OS was not statistically signi cant in multivariable Cox analysis. In addition to these clinical factors, as expected, the lncRNAs signature was an effective independent prognostic factor for the prediction of patients with LSCC.
Although our nomogram demonstrated impressive performance in LSCC survival prediction, there are speci c limitations associated with our trial. First, the presented nomogram based only on TCGA database with limited simple sizes for LSCC, are not yet suitable for general application prior to validation of the predictive models with independent testing cohort. So external and multicenter prospective cohorts with large sample sizes are still needed to validate the clinical application of our model. Second, Missing variables were a source of defect in this evaluation. Such as extracapsular spread [34,35], lymphovascular invasion status [35], perineural invasionas [35] and human papillomavirus (HPV) [36,37] as important prognostic parameters for LSCC patients, weren't well recorded in TCGA database. Notably, our functional enrichment analysis found that the prognostic lncRNAs was signi cantly associated with human papillomavirus. Published researches and meta-analysis indicated HPV positive laryngeal cancer patients is sensitive to radiotherapy and chemotherapy and showed inferior survival [36,37]. Hence, we recommend that future studies should added value of those factors in a multivariable prediction model to improve the accuracy of prediction in LSCC patients Third, our selection of factors was limited to those available in our database. On account of the anonymous database, we cannot extend our database with characteristics such as race, insurance status, comorbidity, hemoglobin level, albumin, tumor hypoxia, and TP53 mutation, which were frequently reported prognostic factors of patients with LSCC [38][39][40]. Further efforts to incorporate more patientspeci c, tumor-speci c and molecular factors will potentially help to improve the performance of the present model. Fourth, we do not explore the underlying biological function and pathways of the prognostic lncRNAs, so further studies are needed to uncover the related mechanisms.

Conclusions
We have built visually comprehensive nomogram, based on TCGA database and incorporated genomic and clinicopathologic factors, for the prediction of survival in patients with LSCC. The nomogram is signi cantly better than TNM stage alone in terms of the predictive value and clinical usability. Importantly, our nomogram presented good discrimination ability in early LSCC and advanced LSCC subgroups.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.