Characteristics of auto-quantified tumor-infiltrating lymphocytes and the prognostic value in adenocarcinoma of the esophagogastric junction, gastric adenocarcinoma, and esophageal squamous cell carcinoma

Background: Adenocarcinoma of the esophagogastric junction (AEGJ) with a specific pathological profile and poor prognosis has limited therapeutic options. Previous studies have found that TILs exhibit distinct characteristics in different tumors and are correlated with tumor prognosis. We established cellular training sets to obtain auto-quantified TILs in pathological images. And we compared the characteristics of TILs in AEGJ with those in esophageal squamous cell carcinoma (ESCC) and gastric adenocarcinoma (GAC) to gain insight into the unique immune environments of these three tumors and investigate the prognostic value of TILs in these three tumors. Methods: Utilizing a case-control study design, we analyzed 214 AEGJ, 256 GAC, and 752 ESCC cases. The TCGA dataset was used to validate prognostic value of auto-quantified TILs. The specific cellular training sets were established by experienced pathologists to determine TILs counts. Kruskal-Wallis test and multi-variable linear regression were conducted to explore TILs characteristics. Survival analyses were performed with Kaplan-Meier method and Cox proportional hazards model. Results: The three cellular training sets of these cancers achieved a classification accuracy of 87.55 at least. The median auto-quantified TILs of AEGJ, GAC, and ESCC cases were 4.82%, 1.92%, and 0.12%, respectively. The TILs demonstrated varied characteristics under distinctive clinicopathological traits. The higher TILs proportion was associated with better prognosis in esophagogastric cancers (all P < 0.05) and was an independent prognostic biomarker on AEGJ in both datasets (Taixing: HR = 0.965, 95% CI = 0.938–0.994; TCGA: HR = 0.811, 95% CI = 0.712–0.925). Conclusions: We found variations in TILs across ESCC, GAC, and AEGJ, as assessed by image processing algorithms. Additionally, TILs in these three cancers are an independent prognostic factor. This enhances our understanding of the unique immune characteristics of the three tumors, improving patient outcomes.


INTRODUCTION
Globally, gastric and esophageal cancers were ranked as the fourth and sixth leading causes of cancer-related deaths, and were responsible for 769,000 and 544,000 deaths in 2020, respectively [1].Given the advanced or metastatic nature of many gastroesophageal cases at diagnosis, the overall 5-year survival rate remains less than 20% in developing countries [2,3].The main factors affecting gastroesophageal cancers prognosis include tumor staging and grading, treatment method, living condition, and genetic marker [4,5].However, as a transitional region tumor from esophageal squamous epithelium to the gastric adenoid epithelium, it is possible that the oncological principles for esophageal and gastric cancer are not directly applicable to junctional cancer [6].
Accumulating studies reveal that the molecular characteristics, pathological course, and clinical behavior of junctional cancer differ from that of gastric and esophageal cancer [7].Junctional cancer primarily refers to adenocarcinoma of the esophagogastric junction (AEGJ).It is based on Siewert's anatomical classification criteria and includes distal esophageal adenocarcinoma (EAC), cardiac cancer, and proximal gastric adenocarcinoma (GAC) [8].The AEGJ incidence has risen rapidly in East Asia, North America, and Europe over the last few decades [9].As the early symptoms are not obvious with a rapid progression, AEGJ is usually diagnosed in the late stages and has a 5-year survival rate of ~6% in the developing world [10].Therefore, there is an urgent need to identify potential molecular markers to predict and improve the prognosis.
A specific component of the tumor immune microenvironment (TME), tumor-infiltrating lymphocytes (TILs) are reflective of host-tumor immune interactions and are predictive of patient prognosis [11,12].TILs primarily include T and B cells, and natural killer (NK) cells, which cooperate with tumor cells by releasing chemokines and cytokines that act as important tumorigenic and prognostic factors and determine tumor progression and aggressiveness [13].Different cancer types have distinct TME, where numerous clinical studies that evaluated the TIL content in breast carcinoma, colorectal carcinoma, and non-small cell lung carcinoma reported that higher TIL infiltration conferred a significant survival benefit [14][15][16].Besides, the observation of the variations in TILs levels can recognize the population or cancer types with a high likelihood of reacting to immunotherapy [13].However, there are few studies focusing on the characteristics of tumor-infiltrating lymphocytes (TILs) in esophagogastric tumors and their potential as prognostic markers to predict and improve survival in AEGJ patients.Additionally, the association between TILs and survival in esophageal squamous cell carcinoma (ESCC) and GAC remains controversial [17,18].
Although the gold standard for evaluating TILs is based on routine haematoxylin-eosin (H&E) staining using a semi-quantitative scoring method, it may be subject to interobserver variability and costly [19].Computational pathology has currently displayed promise in recognizing the biomarkers in tissues, and overcomes limitations related with manual grading and human bias [20].Therefore, we establish cellular training sets for AEGJ, GAC, and ESCC based on the assessment of experienced pathologists.Then, quantification of the TILs on H&E staining sections using an open-source image processing tool that operates with minimal user intervention.We compared the AEGJ TILs characteristics with that of GAC and ESCC under demographic factors and clinical traits.We also examined the association of auto-assessed TILs as a quantitative variable with overall survival in both large datasets.

Demographic information
Table 1 displays the demographic information of the 214 AEGJ, 256 GAC, and 752 ESCC cases included in the analysis.There were significant differences between AEGJ, GAC, and ESCC for age, sex, tea drinking, wealth score, first-line treatment method, TNM staging, tumor differentiation grade, Helicobacter pylori (HP) infection status, and gastric atrophy (all P < 0.05).Compared with ESCC and GAC, AEGJ cases were more likely to be older (mean age: 69.23 years), drink less tea (77.10%), have positive HP status (78.50%), receive combination therapy (24.30%), have advanced TNM stage (28.97%), and have gastric atrophy (25.23%).associated with better prognosis in esophagogastric cancers (all P < 0.05) and was an independent prognostic biomarker on AEGJ in both datasets (Taixing: HR = 0.965, 95% CI = 0.938-0.994;TCGA: HR = 0.811, 95% CI = 0.712-0.925).Conclusions: We found variations in TILs across ESCC, GAC, and AEGJ, as assessed by image processing algorithms.Additionally, TILs in these three cancers are an independent prognostic factor.This enhances our understanding of the unique immune characteristics of the three tumors, improving patient outcomes.

Automated cellular recognition accuracy
We obtained the matched H&E-stained images of 214 AEGJ, 256 GAC, and 752 ESCC cases, each of which was from a solid tumor cross-section.The image processing approach automatically segmented the images and classified the cellular components into cancer cells, lymphocytes, and stromal cells.
The classification was based on cellular training sets using an SVM classifier that pathologists had trained according to the cell features (Figure 1A).Crossvalidation within the cellular training sets of the three cancers yielded overall classification accuracy of >87.55% (Supplementary Table 1).Furthermore, the overall correlation coefficients between automated recognition and the pathologists' quantitative assessment of AEGJ, GAC, and ESCC were 0.92, 0.93, and 0.93, respectively, and the TILs correlation coefficients were 0.94, 0.95, and 0.95, respectively.The correlation coefficients of the cancer cells and stromal cells were all >0.86 (Figure 1B; Supplementary Figure 1A, 1B).Furthermore, the automated recognition of the TILs proportion in AEGJ, GAC, and ESCC was consistent with the manual grading, and all Jonckheere-Terpstra tests yielded P = 0.001 (Figure 1C; Supplementary Figure 1C, 1D).The AEGJ, GAC, and ESCC cellular training sets are provided in the Supplementary Datasets 1-3.2A-2I).The association between TILs proportion and demographic information of the above three gastroesophageal cancers was listed in Table 2.In the AEGJ cases, the TILs percentage was associated with the first-line treatment method (P = 0.030), and the patients with the combination therapy (median TILs: 5.72%) had the highest TILs proportion compared to radiotherapy (median TILs: 1.83%), chemotherapy (median TILs: 4.10%), and surgery (median TILs: 2.77%).A similar association between TILs and the first-line treatment method had also discovered in GAC patients (P = 0.031).Besides, the GAC patients drinking more tea (median TILs: 2.74%) were more likely to have a higher TILs proportion (P = 0.028).In the ESCC cases, the patients eating fewer pickles (median TILs: 0.15%) were more likely to have a higher TILs level (P = 0.002), and the trend tests demonstrated the TILs proportion increased with the BMI ranks (P = 0.008).

The distribution of auto-quantified TILs proportion
We performed a crude comparison of the AEGJ TILs proportion with that of ESCC and GAC by The Kruskal-Wallis H test.The differences in the TILs proportions were statistically significant (P < 0.001).
The multiple comparisons corrected by the Bonferroni method found that the AEGJ cases had the highest TILs proportion (median, 4.82%), followed by GAC (median, 1.92%), and that of ESCC was the lowest (median, 0.12%) (Supplementary Figure 3).However, lymphocyte infiltration is associated with age, sex, BMI, TNM staging, tumor differentiation grade, and first-line treatment method [21][22][23].Hence, based on the above factors and the demographic information that differed among the three cancers, we compared the TILs proportion between these three cancers within each factor.The result revealed that significantly different among the above cancers still existed (Supplementary Table 2; all P < 0.001).Moreover, the distribution of auto-quantified TILs proportion was tested with Spearman correlation analysis and multi-variable linear regression.After adding the cancer type as a variable to the analysis-adjusted covariates, cancer type remained associated with TILs proportion (Table 3; ρ = 0.49; P < 0.001).The standardised effects of AEGJ and GAC were 0.36 and 0.28, respectively (Table 3; R 2 = 0.204; adjusted R 2 = 0.179; P < 0.001).This indicated that the  The AEGJ, GAC, and ESCC cases from the TCGA dataset were divided using median TILs proportions of 1.99%, 4.14%, and 32.40%, respectively, as cut-offs, and evaluated the prognostic value in both datasets.Kaplan-Meier plots were generated to compare the OS based on high and low TILs proportions.There were statistically significant associations between better OS and higher TILs proportion in the three cancers of the Taixing dataset (Figure 3A, 3C, 3E; all P < 0.001).
As validation, we also identified statistically significant differences in OS between the high-and low-groups of the TCGA dataset (Figure 3B, 3D, 3F; all P < 0.05).

DISCUSSION
Although a standardized methodology for manual TILs assessment exists, it has several limitations due to requiring professional pathologists, interobserver variability, and higher costs.To address these problems, our study established the cellular training sets, respectively, explored the characteristics of auto-quantified TILs in AEGJ, GAC, and ESCC.
The prognostic value of auto-assessed TILs was investigated in the above esophagogastric tumors.The TILs proportion was distinctive between different demographic and clinical traits and was the highest in AEGJ compared with GAC and ESCC in Taixing.The auto-quantified TILs were an independent prognostic biomarker for AEGJ, GAC, and ESCC.
The characteristics of TILs infiltration are distinctive in different body mass, eating habits, and cancer treatment methods.The tumor microenvironment (TME) is a specific metabolic niche composed of various cellular components as well as the contents of the tumor interstitial space.Recent research data have revealed that high-fat diet-induced obesity contributes to the tumor cell fat uptake, whereas the CD8+ T cell intaking the energy was suppressed [24].These distinctive AGING adaptations impaired the lymphocyte infiltration degree.However, our study showed the TILs proportion tends to enrich with increasing BMI ranks.This may be related to the types of TILs contained.The TILs include diverse immune cells, e.g., T cells, B cells, and NK cells [25].In this research, we regard TILs as a major category that might weaken the above association.Our study also discovered that patients drinking more tea and intaking fewer pickles were more likely to enrich the TILs.These results were consistent with previous experiments.Mantena et al. [26] proved that the tea polyphenols that originated from tea contributed to the  increasing recruitment of TILs in TME.Besides, eating excessive pickles leads to more nitrite intake, which easily oxidizes hemoglobin to methemoglobin, resulting in the lower oxygen-carrying capacity of blood and facilitating the formation of an immunosuppressive environment [27].Additionally, the patients with combination therapy had the most TILs levels compared to radiotherapy, chemotherapy, and surgery in our cases.These results support that combination therapy produces fewer side effects on immune cells than applying the above three methods alone [28].Hence, keeping a better diet and body quality, and choosing a suitable cancer therapy in the clinical field will help to improve immune infiltration to resist tumor growth.
The TILs infiltration showed a cancer-specificity in esophagogastric cancers.Quantification of TILs is growing in significance as evidence emerges of a reliable biomarker to reflect the better response to immunotherapeutic agents [29].Characterizing the TIL proportion between different solid tumors would provide clues into the varied effectiveness of in immunotherapy.In the general clinical field, the AEGJ, located between the esophagus and stomach is more likely to group with GAC.Nevertheless, increasing evidence demonstrated the AEGJ displayed a significant difference in immune molecular characteristics [30].
In this study, we revealed that the TILs proportion varied between esophagogastric cancer, where AEGJ had the highest TIL proportion.The absolute difference in the TILs proportions between AEGJ and GAC was smaller than that between ESCC and AEGJ or between ESCC and GAC.Our results were similar to previous studies.Mohamed et al. [31] [32].Our results supported and complemented these findings, indicating the specificity of AEGJ in lymphocyte infiltration degree compared to GAC and ESCC.In addition, variations of TILs infiltration in cancers also indirectly reflect the different immunotherapy effects.Our results might present evidence for the specific selection of immunotherapy for esophagogastric cancers.However, some trial examinations of PD-1/PD-L1 blockade in upper gastrointestinal cancers enrolled patients with gastric cancers and AEGJ without distinction [33].Therefore, our results also provide clues for future clinical immunotherapy in esophagogastric cancers and enhance precision therapy.
The auto-assessed TIL proportion is an independent prognostic biomarker in AEGJ, GAC, and ESCC patients.The cumulative studies have focused on the association between semiquantitative scoring of TILs levels and prognosis in esophagogastric cancers, and high TILs scores have been reported as a positive prognosis marker [34][35][36][37].Despite the standardized efforts, the subjective nature and higher costs have limited its translational adoption into clinical practice [38].Besides, the prognostic biomarkers for AEGJ are still under-explored.For this reason, we used the automatic algorithm to quantify the TILs percentage and investigate its prognostic value.We performed a survival analysis of AEGJ and determined that the auto-AGING quantified TILs proportion was an independent prognostic biomarker in Taixing and TCGA datasets.This finding was consistent with the results of two previous studies [39,40], in which the TILs proportion was estimated by pathologists.However, several researchers demonstrated that the prognostic value of TILs in GAC and ESCC has not been defined [41][42][43].Nevertheless, some studies also reported positive results [44][45][46][47].In the present study, we also identified an association between higher auto-assessed TILs proportion and better overall survival in GAC and ESCC cases in both datasets.This discovery supported the idea of the prognostic value of TILs proportion in GAC and ESCC.Hence, we can predict the overall survival of AEGJ, GAC, and ESCC by auto-quantified TILs infiltration degree objectively and it has the potential for translation to the routine clinical and pathological application at minimal additional cost.
As far as we know, this is the first relatively comprehensive establishment of cellular training sets for esophagogastric tumors to automatically quantify TILs infiltration in AEGJ, GAC, and ESCC.We obtained relatively full demographic characteristics and clinical information to explore the TILs characteristics.
The findings contributed to more accurate tumor classification and immunotherapy outcome prediction.
As an independent prognostic factor common to AEGJ, the auto-quantified TILs provided more evidence for its predictive value in upper gastrointestinal tumors.The findings established the foundation for further exploration of TME differences at specific immune cell level, providing crucial insights into immunotherapy and supporting the prognostic value of the autoquantified TILs proportion in esophagogastric tumors, particularly in the AEGJ patients.
This study has some limitations.Our study aims to provide clues for immunotherapy in patients with upper gastrointestinal tumors by comparing the TILs level as the immune characteristic.Although the cases in our study did not receive immunotherapy to directly draw relevant conclusions, our results can provide data support for immunotherapy in AEGJ to some extent.Besides, multivariable survival analysis in the validation set showed that auto-quantified TILs were not an independent prognostic factor for ESCC, which may be related to the small sample size of ESCC in TCGA.However, considering the large sample size of ESCC in the Taixing dataset, we can still consider the independent prognostic value of auto-quantified TILs in ESCC.
Future research should investigate the association between the auto-quantified TILs proportion and clinical outcomes in patients received immunotherapy.
Moreover, incorporating additional independent datasets to deeply validate the independent prognostic value of TILs in patients with AEGJ, GAC and ESCC will enhance the value of the clinical application of this biomarker.

CONCLUSIONS
The

Patients and sample selection
The

Pathological image processing pipeline
We utilized the pathological image processing pipeline published in our previous study [51].The images of the H&E-stained tumor sections were processed using the R package CRImage developed by Yuan et al. [52].Based on watershed segmentation and Otsu thresholding for haematoxylin-positive nuclei, this tool was embedded with the EBImage R package and a support vector machine (SVM) [52], to achieve color transformation and segmentation of the nuclei, then analyzed each morphological feature of each nuclei detected, such as shape, intensity, and texture features.
The resulting morphological and textural features were input into the SVM for the supervised classification of cancer cells, lymphocytes, and stromal cells.The cancer cells exhibited large nuclei and variable texture and shape, and lymphocytes were small, round, and contained basophilic nuclei.Therefore, cancer cells and lymphocytes could be reliably differentiated from stromal cells that contained the elongated nuclei of fibroblasts and endothelial cells.
We selected the regions containing tumor cells, lymphocytes, and stromal cells from the tissue images and imported them into EBImage for conversion to the LAB color space.The mean and standard deviation of each channel were computed to convert the image to grayscale for further segmentation and cell recognition.Subsequently, an Otsu threshold to partition the image into foreground and background was constructed by the algorithm of maximization of the between-class variance method and morphological opening.Leveraging both the image grayscale and the threshold, the algorithm can eliminate noise and refine the cell edges.Eventually, the watershed segmentation was performed to separate cell clusters, automatically outlining recognized cells in the image.
The senior pathologists were invited to discern the circled cell types, identifying them as cancer cells, lymphocytes, and stromal cells.Subsequently, the EBImage toolkit integrated within CRImage was employed to extract 43 cellular features, encompassing nucleus perimeter, major axis, eccentricity, and the count of neighboring cells, among additional metrics.

Figure 1 .Figure 2 .
Figure 1.Establishment and verifications of the cell training set in AEGJ.(A) Example images of the three classes used in the classifier: cancer cells, lymphocytes, and stromal cells.(B) Cell proportions obtained by automated image analysis were compared to pathologists' counts for a total of 10,000 single cells in a representative set of 20 tissue samples within AEGJ.(C) TILs proportions versus manual grading for AEGJ TIL infiltration in random one-third samples.

Figure 3 .
Figure 3. Kaplan-Meier curves of OS based on TILs proportion in the discovery and validation datasets.(A, C, E) Survival analysis between the high-and low-TILs groups in 752 ESCC, 214 AEGJ, and 256 GAC cases in Taixing, China, 2010-2014.(B, D, F) Survival analysis between the high-and low-TILs groups in 70 ESCC, 169 AEGJ, and 222 GAC cases of the TCGA dataset.

Table 4 . Univariable and multivariable Cox regression analyses of basic characteristics with OS in AEGJ of Taixing dataset (Discovery, N = 214).
a aHR with adjustment for TILs proportion, age, sex, differentiation grade, first-line treatment method, and BMI.
findings of this study suggested that TILs levels determined by CRImage based on three different cell training sets are showing distinctive characteristics between various demographic information, clinical traits, and cancer types.The auto-quantified TILs are an independent prognostic factor in AEGJ, GAC, and ESCC patients, and are associated with a favorable prognosis.It is a cost-effective biomarker to predict and improve prognosis in clinical and pathological research.

Table 2 . The comparison of TILs proportion in the 214 AEGJ, 256 GAC, and 752 ESCC cases.
The confusion matrix of the real classes of the cells (columns) and the predicted classes (rows) are shown, where Precision was calculated by true positives/(true positives + false positives) and Recall was calculated by true positives/(true positives + false negatives).

Supplementary Table 3. Univariable and multivariable Cox regression analyses of basic characteristics with OS in GAC of Taixing dataset (Discovery, N = 256).
a aHR with adjustment for TILs, age, sex, grade of differentiation, first-line treatment method, BMI.AGING Supplementary

Table 4 . Univariable and multivariable Cox regression analyses of basic characteristics with OS in ESCC of Taixing dataset (Discovery, N = 752).
aHR with adjustment for TILs, age, sex, grade of differentiation, first-line treatment method, BMI. a

Table 5 . Univariable and multivariable Cox regression analyses of basic characteristics with OS in AEGJ of Taixing dataset (Discovery, N = 117).
aHR with adjustment for TILs, age, sex, grade of differentiation, TNM staging, first-line treatment method, BMI. a

Table 6 . Univariable and multivariable Cox regression analyses of basic characteristics with OS in GAC of Taixing dataset (Discovery, N = 148).
aHR with adjustment for TILs, age, sex, grade of differentiation, TNM staging, first-line treatment method, BMI. a

Table 7 . Univariable and multivariable Cox regression analyses of basic characteristics with OS in ESCC of Taixing dataset (Discovery, N = 418).
aHR with adjustment for TILs, age, sex, grade of differentiation, TNM staging, first-line treatment method, BMI. a

Table 8 . Univariable and multivariable Cox regression analyses of basic characteristics with OS in AEGJ of TCGA dataset (Validation, N = 169).
aHR with adjustment for TILs proportion, age, sex, first-line treatment method, grade of differentiation, and TNM staging. a