Plasma-based lipidomics reveals potential diagnostic biomarkers for esophageal squamous cell carcinoma: a retrospective study

Background Esophageal squamous cell carcinoma (ESCC) is highly prevalent and has a high mortality rate. Traditional diagnostic methods, such as imaging examinations and blood tumor marker tests, are not effective in accurately diagnosing ESCC due to their low sensitivity and specificity. Esophageal endoscopic biopsy, which is considered as the gold standard, is not suitable for screening due to its invasiveness and high cost. Therefore, this study aimed to develop a convenient and low-cost diagnostic method for ESCC using plasma-based lipidomics analysis combined with machine learning (ML) algorithms. Methods Plasma samples from a total of 40 ESCC patients and 31 healthy controls were used for lipidomics study. Untargeted lipidomics analysis was conducted through liquid chromatography-mass spectrometry (LC-MS) analysis. Differentially expressed lipid features were filtered based on multivariate and univariate analysis, and lipid annotation was performed using MS-DIAL software. Results A total of 99 differential lipids were identified, with 15 up-regulated lipids and 84 down-regulated lipids, suggesting their potential as diagnostic targets for ESCC. In the single-lipid plasma-based diagnostic model, nine specific lipids (FA 15:4, FA 27:1, FA 28:7, FA 28:0, FA 36:0, FA 39:0, FA 42:0, FA 44:0, and DG 37:7) exhibited excellent diagnostic performance, with an area under the curve (AUC) exceeding 0.99. Furthermore, multiple lipid-based ML models also demonstrated comparable diagnostic ability for ESCC. These findings indicate plasma lipids as a promising diagnostic approach for ESCC.


INTRODUCTION
Esophageal cancer (EC) is the eighth most prevalent malignancy in the world and the sixth leading cause of cancer-related death (Morgan et al., 2022).Histologically, EC can be classified into two distinct subtypes, esophageal squamous cell carcinoma (ESCC) and esophageal adenocarcinoma (EA).The former accounts for 90% of all cases and developing countries bear the burden of 80% of global cases (Liang, Fan & Qiao, 2017).Ongoing research has identified alcohol abuse and smoking as the two most definitive risk factors for ESCC (Reichenbach et al., 2019), and other uncertain risk factors include radiation and pesticide exposure, sedentary lifestyle, and diet with low-fiber intake (Codipilly & Wang, 2022).ESCC progresses rapidly, carries a bleak prognosis, and exhibits a high mortality rate, with a five-year survival rate of less than 20%.Therefore, it is vital to implement better management for ESCC patients.
Early detection and diagnosis are effective strategies to reduce the mortality of ESCC.However, early-stage ESCC often lacks noticeable symptoms, thereby making it prone to being overlooked.The gold standard of diagnosis for ESCC is esophageal endoscopic biopsy (Liang, Fan & Qiao, 2017).Although the endoscopy is highly sensitive, it is accompanied by a considerable financial burden and invasiveness, leading to suboptimal patient compliance and a lack of cost-effectiveness, so it is not suitable for ESCC screening in non-high-risk areas within China (Zhu et al., 2020).Moreover, tumor markers can merely serve as auxiliary diagnosis, such as CEA, CA125 and CA199, which are also elevated within the bloodstream of patients afflicted with other malignancies or inflammation of the digestive tract.Hence, there is an urgent need for accurate, convenient, and less invasive diagnostic methods for ESCC.
Metabolic disorders, including carbohydrate metabolism, amino acid metabolism, nucleotide metabolism, and lipid metabolism, play crucial roles in tumorigenesis (Huang et al., 2020;Kaushik & De Berardinis, 2018;Schmidt et al., 2021).Currently, metabolomics has emerged as a powerful tool for identifying metabolic alterations in various diseases (Li et al., 2021).Lipidomics, as a branch of metabolomics, has gained attraction in cancer research due to the detection of dysregulated lipid metabolism in tumors, including ESCC (Liang et al., 2021;Yuan et al., 2021).However, huge amount of differential lipids detected from lipidomics pose challenges in identifying the most diagnostic one for tumors.While, machine learning (ML) algorithms, known for their data processing capabilities, can effectively select relevant lipid metabolism features and construct diagnostic models for tumors (Ambale-Venkatesh et al., 2017;Kourou et al., 2015;Yuan et al., 2021).
In this study, plasma-based lipidomics was conducted in ESCC patients and healthy controls using liquid chromatography-mass spectrometry (LC-MS) to find potential lipid biomarkers with diagnostic value in ESCC.Further, ML and lipidomics results were combined to develop diagnostic model for early ESCC diagnosis, aiming to explore a novel approach in this area.

Participants and sample collection
This retrospective study analyzed plasma samples obtained from Zhejiang Cancer Hospital (Hangzhou, China) between December 2010 and December 2012.Plasma samples were collected from 40 pathologically diagnosed ESCC patients and 31 healthy controls (HCs).Table 1 presents the basic clinical information of the participants.Blood was collected from individuals who had fasted overnight and transferred into vials pre-treated with the anticoagulant reagent (ethylenediaminetetraacetic acid disodium potassium salt).Plasma was obtained by centrifuging the blood at 2400xg for 8 min.The samples were then stored at −80 • C until analysis.All procedures involving human participants were conducted in accordance with the ethical standards set by the Ethics Committee of Zhejiang Cancer Hospital (IRB-2019-66), following the principles of the 1964 Helsinki Declaration and its subsequent amendments or comparable ethical standards.Furthermore, since the samples used in our study came from the biobank of Zhejiang Cancer Hospital, the Medical Ethics Committee of Zhejiang Cancer Hospital waived the need for informed consent.

Lipid extraction from plasma
Plasma lipid extraction was conducted referring to previous publication (Yang et al., 2020) with some modifications.Briefly, 300 µL of chilled methanol was added to 40 µL of plasma, followed by mixing for 1 min.Then, 1 mL of MTBE was added and the mixture was shaken at a frequency of 60 Hz for 1 h at room temperature.After that, 250 µL of water was added, and incubated on ice for 10 min.An aliquot of 400 µL of the upper organic phase was transferred to a new tube after separation by centrifugation at 16,200xg at 4 • C for 15 min, and subsequently was dried using a concentrator.The dried residue was reconstituted by mixing with 80 µL ACN-IPA-water (65:30:5, v/v/v) followed by centrifugation at 16,200xg, at 4 • C for 15 min, and an aliquot of 60 µL of supernatant was transferred to sample vial.Finally, 5 µL of supernatant was loaded to liquid chromatography-mass spectrometry (LC-MS) analysis.
The pooled quality control (QC) plasma samples were generated by combining equal aliquots of plasma from each individual sample, which were then dispensed into 40 µL volumes.The extraction process employed for these pooled samples was identical to that used for the individual sample pretreatment.

LC-MS analysis
LC-MS analysis was conducted according to previous research (Yang et al., 2022).In brief, ultimate 3000 UHPLC system coupled with Q exactive orbitrap mass spectrometer (both from Thermo Fisher Scientific, Waltham, MA, USA) was used for lipidomics analysis.Chromatographic separation was carried out on an Acquity UPLC BEH C18 column (2.1 mm × 100 mm, 1.8 µm, Waters, Milford, MA, USA).Solvent A consisted of a mixture of ACN/water (3:2, v/v) containing 0.1% (v/v) formic acid and 10 mM ammonium acetate, while solvent B was composed of IPA/ACN (9:1, v/v) with the same additives.The flow rate was 0.3 mL/min, and column temperature was set at 50 • C. The elution condition was set at 0.0−1.5 min, 32% B; 1.5 min-15.5 min, 32%-85% B; 15.5-15.6 min, 85%-97% B; 15.6-18.0min, 97% B; 18.0-18.1 min, 97%-32% B; 18.1-20.0min, 32% B. The settings for the mass spectrometer included a capillary voltage of 3.0 kV and a capillary temperature of 300 • C. The sheath gas flow rate was set to 50 Arb.The auxiliary gas had a flow rate and temperature of 15 Arb and 320 • C, respectively.The scan range was set at m/z 100-1200.The full scan MS had a resolution of 70,000 and an AGC target of 3 × 10 6 .The data-dependent MS/MS had a resolution of 17,500 and an AGC target of 1 × 10 5 .The normalized collision energy was set to 30, 40, and 50 eV, respectively.
The analytical procedure employed a full scan mode to collect data from all samples in the batch.For qualitative quality control (QC) samples, data-dependent acquisition (DDA) mode was utilized.To ensure consistent performance and accuracy during the analysis, QC samples were interspersed within the sample injection sequence.The sequence commenced with three consecutive QC samples, followed by the inclusion of one QC sample every ten samples.The sequence concluded with another three consecutive QC samples.This approach helped to correct for mass spectrometry signal fluctuations and maintain reliable data quality throughout the analysis.

Metabolomics data analysis
Data analysis was performed according to existing articles (Yang et al., 2022) with some modifications.Briefly, the research utilized ProteoWizard's msconvert tool (https://proteowizard.sourceforge.io/download.html)to convert RAW format data into mzXML format data.The R package xcms was then employed for detecting and extracting ion features, which included tasks such as peak picking and retention time correction.To correct signal shifts, the R package statTarget utilized QC-based random forest signal correction (QC-RFSC).Subsequently, ion feature filtration was conducted.In this step, variables were retained if they had a non-zero value in at least 80% of samples within any single group.However, variables in QC samples with a relative standard deviation (RSD) greater than 30% were excluded.Imputation was performed using the K-nearest neighbors (KNN) algorithm.Prior to chemometrics analysis, the detected ions in each sample belonging to the same class were normalized by setting the sum of their peak areas to 100,000.This rigorous approach ensured that only the most consistent and reliable features were retained for further analysis, thereby significantly improving the overall robustness and reliability of the lipidomics data.
Train set samples were used to discover the differentially expressed plasma lipids between ESCC and HC groups.Firstly, unsupervised principal component analysis (PCA) was employed to visualize the overall separation trend of all samples based on the ion features.Subsequently, a supervised partial least squares discriminant analysis (PLS-DA) was utilized to assess the classification ability of these ion features, yielding a variable importance in projection (VIP) value for each ion feature.Furthermore, the statistical significance of the ion features between the ESCC and HC groups was evaluated using a two-tailed Student's t -test, and the Benjaminii-Hochberg false discovery rate (FDR) was also calculated.The criteria for defining differentially expressed ion features were as follows: VIP >1.0, adjusted p-value (FDR) <0.05, and fold change (FC) >1.50 or <0.667.
The annotation of lipids was as previously described (Tsugawa et al., 2020).Briefly, the RAW format data was first converted to the abf format using Abf converter (https://www.reifycs.com/abfconverter/).Then, MS-DIAL software ver.4.9 (http://prime.psc.riken.jp/compms/index.html)was used to perform feature detection on all ions with the following parameters: the tolerance of MS1 and MS2 were set 0.01Da and 0.025Da; identification score cut off was set 80%; in positive mode, [M + H] + , [M + NH4] + , [M + Na] + and [M + H-H2O] + were selected as the adduct types; in negative mode, [M -H] − was selected as the adduct type; the retention time was tolerance set to 0.05 min and MS1 tolerance was set to 0.015 Da in all ions feature alignment option.
The differential lipids between ESCC and HC groups in both train set and test set were illustrated using heatmap.

Diagnostic significance of plasma lipid
In order to assess the diagnostic value of plasma lipids, receiver operating characteristic (ROC) curve analysis was performed for each differentially expressed lipid in the train set using the R package pROC.This analysis allowed calculation of AUC values.The top nine lipids with the highest AUC values in the ROC curve were selected as the most diagnostic plasma lipids.
To determine the optimal cutoff value in the train set, the threshold was set at the point where the Youden Index (Sensitivity + Specificity − 1) was maximized.This approach aims to find the threshold that maximizes the difference between the true positive rate (sensitivity) and the false positive rate (1 − specificity), striking a balance between sensitivity and specificity.
To evaluate the diagnostic value of the plasma lipids, the cutoff value determined in the train set was used to predict the classification of samples in the test set.The performance of each lipid was then assessed using a confusion matrix, which allowed calculation of diagnostic metrics such as sensitivity, specificity, and accuracy.By following this procedure, the study aimed to identify the most informative plasma lipids for diagnostic purposes and evaluate their performance in classifying samples in both the train and test sets.
In addition to single lipid-based diagnostic models, multiple lipid-based ML models were investigated, including partial least squares (PLS) and random forest (RF) from the caret package, and support vector machine (SVM) from the e1071 package.ROC curves were plotted for each model using the train set, and their prediction performance was evaluated in the test set using confusion matrix calculations.

Relationship between plasma lipid levels and clinical features
To establish the relationship between differentially expressed plasma lipid levels and clinical features such as sex, age, drinking history, smoking history, lymph node metastasis (LNM), and TNM stage, statistical tests were conducted.The Kruskal-Wallis test or Wilcoxon test was employed to compare the lipid levels among different groups in all samples.The p-value of less than 0.05 was considered statistically significant, indicating a significant association between the lipid levels and the clinical features being examined.

Basic characteristics of the participants
This study enrolled 40 patients pathologically diagnosed ESCC and 31 healthy controls (HCs).Age and sex were matched between the two groups without any statistically significant difference, as shown in Table 1.Among the ESCC patients, nine were in stage I, 15 were in stage II, 16 were in stage III while no patients were in stage IV.The samples were randomly divided into a training set (ESCC = 28, HC = 20) and a test set (ESCC = 12, HC = 11).

Metabolic shift between ESCC and HC
Metabolic features were analyzed in train set and depicted in Fig. 1.Following peak picking, retention time alignment, grouping, and signal shift correction, a total of 41,028 ion features were obtained, including 22,591 positive ions and 18,437 negative ions.Based on these metabolic features, PCA and PLS-DA score plots were generated to investigate the differences between ESCC and HC groups.The results exhibited a significant separation between the ESCC and HC groups, indicating notable dysregulation in the plasma lipid profile of ESCC patients (Figs.1A and 1B).The reliability of the PLS-DA result was further validated using a permutation test (n = 20) (Fig. 1C).Furthermore, based on the criteria of differential lipid (FDR < 0.05; VIP > 1.0; FC > 1.50 or FC < 0.667), a total of 5,899 differential metabolic features were shown in the volcano plot, including 2,654 up-regulated features and 3,245 down-regulated features (Fig. 1D).

Differentially expressed plasma lipids in ESCC
A total of 99 differentially expressed plasma lipids were identified and found to exhibit significant differences between individuals from ESCC and HC.Among these lipids, 15 were upregulated (FC >1.5) and 84 were downregulated (FC <0.667) in the plasma of ESCC patients compared to HC.The detailed information of these lipids is summarized in Table 2. Heatmap displaying the expression patterns of the differentially expressed lipids in the train set (Fig. 2A) revealed evident differences between ESCC and HC.Among these lipids, approximately three-fourths exhibited a downward trend in ESCC patients compared to HC.Similar expression patterns were observed in the test set (Fig. S1).The proportion of lipid classifications was presented in the pie chart (Fig. 2B), including 21 fatty acids (FAs), 22 glycerolipids (GLs), 37 glycerophospholipids (GPs), 18 sphingolipids (SPs), and one sphingomyelin (SM).GPs represented the largest proportion of differential lipids at 37%, with 34 downregulated lipids and three upregulated lipids.Among the upregulated differential lipids (Fig. 2C), there were eight FAs, three GPs and SPs, and one GL.Among the downregulated differential lipids (Fig. 2D), there were 13 FAs, 21 GLs, 34 GPs, 15 SPs, and one SM.

Diagnostic performance of the differential plasma lipids
ROC curve analysis revealed that plasma lipids show promise as diagnostic biomarkers for ESCC.The top nine lipids with the highest AUC values were FA 15:4, FA 27:1, FA 28:7, FA 28:0, FA 36:0, FA 39:0, FA 42:0, FA 44:0, and DG 37:7.Eight of these nine lipids achieved an AUC value of 1.00, indicating excellent diagnostic accuracy.FA 15:4 and DG 37:7 showed an up-regulated trend in ESCC samples, while the remaining lipids showed a down-regulated trend (Fig. 3).In testing step, the confusion matrix charts (Fig. 4) were constructed for a more intuitive representation of the diagnostic performance of the top nine lipids.All the nine lipids depicted in the chart exhibited prediction accuracy exceeding 85%, with six exceeding 95%.And seven of these nine lipids achieving 100% diagnostic efficiency in detecting tumors.The top 15 lipids, ranked by their prediction accuracy in the test set, were selected.Table 3 presents their AUC values in the train set, as well as prediction accuracy, sensitivity, specificity, precision, and recall values in the test set.Among these lipids, 13 belonged to FAs, and 11 lipids (FA 27:1, FA 28:7, FA 28:0, FA 39:0, FA 42:0, FA 44:0, FA 22:2, FA 36:0, FA 25:0, FA 19:1, FA 27:0) achieved prediction accuracy exceeding 0.90.The results for the remaining differential lipids can be found in the Table S1.Multiple lipid-based models were constructed using the training set data.The ROC curves in Figs.5A, 5B and 5C showed high AUC values of 0.990, 0.990, and 0.980, indicating excellent prediction performance.These results were consistent with the performance of individual differential lipids.The accuracy of the models was further validated in the test set, achieving accuracies of 95.7% (Figs.5D, 5E and 5F).These findings demonstrate the effectiveness of the multiple lipid-based models as accurate diagnostic tools for ESCC.

Relationship between plasma lipid levels and clinical features
The results revealed that 14 differentially expressed plasma lipids were associated with age (Fig. S2), 14 ones with sex (Fig. S3), 18 ones with smoking history (Fig. S4), and only one with drinking history (Fig. S5).

DISCUSSION
ESCC is characterized by rapid progression and poor prognosis, therefore, early detection and screening are particularly important (Napier, Scheerer & Misra, 2014).Compared to current traditional methods like esophageal endoscopy biopsy and imaging examination, plasma analysis conducted in our study offers the advantages of non-invasiveness and convenience, regardless of equipment and operator expertise (Codipilly et al., 2018;He & Ke, 2020).Meanwhile, the increasing attention on blood tests for disease surveillance is attributed to their specific ability to reveal alterations within the internal environment.
While the sensitivity and specificity of blood tests for ESCC screening, involving autoantibodies to tumor-associated antigens (TAAs), circulating tumor cells (CTCs), and circulating miRNA, can significantly vary among patients, pathological types, and disease stages (Codipilly et al., 2018;He & Ke, 2020).To achieve a non-invasive, convenient, and highly accurate diagnostic method for ESCC, our study utilized plasma-based lipidomics analysis combined with ML techniques to develop an early diagnostic model based on plasma lipids.The model was constructed using PLS, PF, and SVM algorithms and validated using a test set, thereby enhancing the reliability of the proposed approach.
This study provides compelling evidence of significant dysregulation in the plasma lipid profile of patients with ESCC compared to HC. Dysregulated lipids in ESCC patients include fatty acids (FA), diacylglycerols (DG), and triglycerides (TG).FA exhibit both upregulation and downregulation in ESCC patients.Upregulation of FA in the plasma can be attributed to enhanced FA synthesis capacity in tumor cells, facilitated by abnormal expression and activity of FA synthesis-related enzymes such as fatty acid synthase (FASN), ATP-citrate lyase (ACLY), stearoyl-CoA desaturase (SCD1), and acetyl-CoA carboxylase (Lien et al., 2021).Transcriptional regulation by key transcription factors like SREBP1 and PRP19 (Zhang et al., 2023), as well as activation of signaling pathways such as PI3K/Akt/mTOR and MAPK pathways (Yu et al., 2022), may also contribute to FA upregulation and enhance tumor cell proliferation.However, tumor cells may increase FA uptake and utilization, which can lead to a decrease in FA concentration.The heightened metabolic activity of tumor cells promotes FA oxidation for energy production, biological membrane synthesis, and signal transduction (Yuan et al., 2021), further contributing to lower FA concentrations in the plasma.DG and TG consistently show a downward trend in ESCC patients.The downregulated concentrations of DG and TG in the plasma may be linked to the heightened metabolic activity of tumor cells.Rapidly proliferating tumor cells have increased energy demands, resulting in the consumption of DG and TG through enhanced FA oxidation and energy utilization.This phenomenon explains the observed downregulation of DG and TG levels in the plasma in this study.
In terms of diagnostic models, traditional single biomarker models were basically used in previous research.However, with the increasing advancements in omics research, there has been a rise in the use of ML-assisted multivariate models.In this context, it is worth discussing that our study demonstrated excellent diagnostic performance for individual lipids, while the ML-based multiple-lipids models did not exhibit significantly better diagnostic performance.There are a few reasons accountable for this phenomenon.Firstly, Lastly, a small sample size, indeed, hampers the effectiveness of multivariate modeling, limiting the ability to highlight its advantages.This issue can be addressed by expanding the sample size, as it allows for a more comprehensive analysis of multiple factors.In our future research, we will make additional efforts in data collection and group refining to optimize the diagnostic models.
Several studies have shown that lipid metabolism may have an impact on tumor progression.Jiao et al.'s (2023) study revealed the influence of lipid metabolism-related enzyme LPCAT on the progression of ESCC.Therefore, the Kruskal-Wallis test and Wilcoxon test were performed to reveal the combination of differential lipids with clinical information of ESCC patients.We observed distinct lipid metabolism profiles between samples with or without lymph node metastasis.Furthermore, the metabolic levels of the same lipid varied across different stages of ESCC.Our findings suggest that lipid metabolism levels may serve as indicative factors for patient staging and the presence of lymph node metastasis.The present study has limitations that should be acknowledged.The small sample size from a single center restricts the robustness and generalizability of the data, as it may lack representativeness, be prone to bias, have reduced statistical power, and limited applicability to other populations or settings.Additionally, the absence of a control group comprising high-risk individuals with benign lesions is a limitation.Including such a control group would enable the evaluation of the diagnostic model's accuracy, sensitivity, and specificity for ESCC.Moreover, it would facilitate the identification of specific metabolic profiles that differentiate ESCC from benign lesions, assess test performance, and enhance internal validity.Furthermore, the study only utilized plasma samples for lipid metabolism profiling analysis, without considering serum or tissue samples.To improve future studies, it is recommended to increase the sample size, involve multiple centers, include a control group with benign lesions, and incorporate a wider range of experimental samples from serum and tissue.These enhancements would contribute to more robust and comprehensive research.

CONCLUSION
In conclusion, this study has developed a novel and valuable diagnostic model for ESCC by integrating plasma-based lipidomics and ML algorithms, enabling more efficient and accurate clinical diagnosis.Furthermore, the identified prognostic lipid markers exposed the dysregulated lipid metabolism in ESCC, which may provide new therapeutic targets to guide clinical treatment.In summary, this study has improved the understanding in the field of cancer diagnostic model construction by combining metabolomics and ML algorithms.This approach holds promise for cancer diagnosis and has the potential to promote the cancer treatment.However, it is important to acknowledge that the current model only focuses on ESCC, and separate models for different pathological types of esophageal cancer need to be developed and validated.Moreover, the heterogeneity of metabolic characteristics among different patients and disease stages should be considered, and further optimization of the diagnostic model is necessary.There is still substantial work to be done before the model can be effectively implemented in clinical settings, highlighting the need for ongoing research in this area.
• Zhongjian Chen conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.
• Weimin Mao conceived and designed the experiments, performed the experiments, analyzed the data, authored or reviewed drafts of the article, supervision, and approved the final draft.

Human Ethics
The following information was supplied relating to ethical approvals (i.e., approving body and any reference numbers): Medical Ethics Committee of Zhejiang Cancer Hospital (IRB -2019-66).

Figure 1
Figure 1 Separation of patients with ESCC and healthy controls based on ion features in the train set.(A) Principal components analysis (PCA) score plot distinguishing ESCC patients from healthy controls based on plasma-detected ions in the train set.C, ESCC patient; N, healthy control.(B) Partial least squares discriminant analysis (PLS-DA) score plot differentiating between ESCC and healthy control groups in the train set.(C) Permutation test (n = 20) confirming the validity of the PLS-DA model.(D) Volcano plot displaying differentially expressed ion features in the train set samples.Red dots representing significantly upregulated ions (VIP > 1.0, FDR < 0.05, fold change > 1.50), blue dots representing significantly downregulated ions (VIP > 1.0, FDR < 0.05, and fold change < 0.667), and grey dots representing ions without significant change.Full-size DOI: 10.7717/peerj.17272/fig-1

Figure 2 ChenFigure 3
Figure 2 Differential plasma lipids detected in the train set.(A) Heatmap clustering the 99 differential lipids in train set revealed a lipid metabolism shift in ESCC patients (n = 28) compared with healthy controls (n = 20).(B) Pie chart showing the proportion of lipid classifications among all the differential lipids above.(C) Pie chart showing the proportion of lipid classifications among up-regulated differential lipids above.(D) Pie chart showing the proportion of lipid classifications among down-regulated differential lipids above.FA, fatty acid; GL, glycerolipid; GP, glycerophospholipid; SM, sphingomyelin; SP, sphingolipid.Full-size DOI: 10.7717/peerj.17272/fig-2

Figure 5
Figure 5 Receiver operating characteristic (ROC) and diagnostic performance of the multiple lipidbased diagnostic models in the test set.Receiver operating characteristic (ROC) curves of multiple-lipids predictive models constructed using PLS (A), RF (B), and SVM (C) algorithms in the train set.Confusion matrix tables constructed using confusion matrix algorithms in the test set to assess the diagnostic accuracy of the multiple-lipids predictive model above, (D) PLS, (E) RF, and (F) SVM.Full-size DOI: 10.7717/peerj.17272/fig-5

Figure 6
Figure 6 Relationship between lymph node metastasis and differentially expressed plasma lipids.Boxplots illustrating the distribution of peak relative intensity of differential lipid in patients with or without lymph node metastasis.YES: ESCC patients with lymph node metastasis, NO, ESCC patients without lymph node metastasis.Wilcoxon test [1] was performed to identify differential lipids associated with lymph node metastasis in ESCC patients.Lipids were selected based on a significance threshold of P-value < 0.05.* P ≤ 0.05, ** P ≤ 0.01, and **** P ≤ 0.0001.Full-size DOI: 10.7717/peerj.17272/fig-6

Figure 7
Figure 7 Relationship between tumor stage and differential expressed lipids.Boxplots illustrating the distribution of peak relative intensity of differential lipid in ESCC patients across I, II, and III stages.Kruskal-Wallis test was performed to identify differential lipids associated with TNM stage in ESCC patients.Lipids were selected based on a significance threshold of P-value < 0.05.* P ≤ 0.05, ** P ≤ 0.01, *** P ≤ 0.001, and **** P ≤ 0.0001.Full-size DOI: 10.7717/peerj.17272/fig-7

Table 1 Participant characteristics.
Notes.a healthy control.b patient with esophageal squamous cell carcinoma.c P-value based on

Table 2
(continued) a Mass to charge ratio.b Retention time.c Variable importance in projection.d P-value based on student's t test, P-value <0.05 represents significant difference between two comparison groups.e Fold change.

Table 3 Lipids of top 15 accuracy in prediction of test set by ROC analysis.
a Area under the curve of ROC.b Positive predictive value.c True positive rate.