Value of Artificial Intelligence in Evaluating Lymph Node Metastases

Simple Summary In surgical pathology, the assessment of the presence of lymph node metastases is a key aspect in terms of the staging and prognosis of cancer patients. This type of work is time-consuming and prone to error. Owing to digital pathology, artificial intelligence (AI) applied to whole slide images (WSIs) of lymph nodes can be exploited for the automatic detection of metastatic cells, so this task can be automated and standardized, increasing diagnostic quality. This manuscript aims to systematically review the published literature regarding the application of various artificial intelligence systems for the assessment of metastases in lymph nodes in whole slide images. Abstract One of the most relevant prognostic factors in cancer staging is the presence of lymph node (LN) metastasis. Evaluating lymph nodes for the presence of metastatic cancerous cells can be a lengthy, monotonous, and error-prone process. Owing to digital pathology, artificial intelligence (AI) applied to whole slide images (WSIs) of lymph nodes can be exploited for the automatic detection of metastatic tissue. The aim of this study was to review the literature regarding the implementation of AI as a tool for the detection of metastases in LNs in WSIs. A systematic literature search was conducted in PubMed and Embase databases. Studies involving the application of AI techniques to automatically analyze LN status were included. Of 4584 retrieved articles, 23 were included. Relevant articles were labeled into three categories based upon the accuracy of AI in evaluating LNs. Published data overall indicate that the application of AI in detecting LN metastases is promising and can be proficiently employed in daily pathology practice.


Introduction
The incidence of cancer has been increasing worldwide due to a growing and aging population coupled with the adoption of screening programs [1]. One of the most relevant prognostic factors for cancer patients is the presence of lymph node (LN) metastasis.
prognostic factors for cancer patients is the presence of lymph node (LN) metastasis. Metastatic disease is an important feature that impacts patient clinical staging and treatment decisions [2]. However, having pathologists manually review LNs microscopically for the presence of metastatic tumor cells is a tedious, time-consuming, and potentially errorprone process. Additionally, many hospitals require intraoperative examinations of sentinel LNs on frozen sections for guiding surgical procedures (Figure 1) [3]. Currently, pathologists may be required to screen a large number of slides of lymph nodes, often including additional immunohistochemical (IHC) stains to conventional hematoxylin and eosin (H&E)-stained sections. As a result, this has increased the workload for surgical pathologists. Recently, whole slide imaging (WSI) has made digital pathology (DP) more useful for primary diagnosis [4,5] and non-clinical purposes (e.g., education and research) [6][7][8]. By transforming glass slides into WSIs, it has now become possible to use computerized digital image analysis systems including artificial intelligence (AI)-based deep learning (DL) algorithms to analyze digital slides [9]. DP has demonstrated a strong performance in various tasks in different fields, including pathology applications, some of which have already been approved by the Food and Drug Administration (FDA) [10]. In recent years, many AI-based algorithms have been created for the automatic detection of metastases in LNs in WSI. Such novel technology exhibits the potential to reduce pathologists' workload and increase diagnostic accuracy. The aim of this manuscript is to systematically review the published literature regarding the application of various AI systems for the assessment of metastases in LNs in WSIs. In the next sections of the paper, the search strategy, along with the main characteristics of the retrieved studies, are, respectively, reported in the Material and Methods and the Results. Hence, the included papers are further analyzed in specific paragraphs of the Discussion section according to the following fields of application: (i) the value of AI in lymph node metastases of breast cancer, (ii) the results of public challenges employing and comparing different AI tools, (iii) the value of AI in lymph node metastases detection during intraoperative consultation, and (iv) the role of AI in identifying nodal metastases of tumors apart from breast cancer.

Materials and Methods
A systematic review of the literature was conducted, without language restrictions, according to the guideline for Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) [11] and Meta-Analysis of Observational Studies in Epidemiology (MOOSE) [12]. The databases, Pubmed and Embase, were systematically searched until August 2022 to identify any study regarding the application of image analysis and/or AI for the detection of lymph node metastases. The search strategy comprised a combination of terms including "image and analysis", "artificial and intelligence", "morphometry", "histomorphometric", "neural network", "convolutional", "computational", "deep learning", "automated", "machine learning" "lymph node", "metastasis", "WSI", and Recently, whole slide imaging (WSI) has made digital pathology (DP) more useful for primary diagnosis [4,5] and non-clinical purposes (e.g., education and research) [6][7][8]. By transforming glass slides into WSIs, it has now become possible to use computerized digital image analysis systems including artificial intelligence (AI)-based deep learning (DL) algorithms to analyze digital slides [9]. DP has demonstrated a strong performance in various tasks in different fields, including pathology applications, some of which have already been approved by the Food and Drug Administration (FDA) [10]. In recent years, many AI-based algorithms have been created for the automatic detection of metastases in LNs in WSI. Such novel technology exhibits the potential to reduce pathologists' workload and increase diagnostic accuracy. The aim of this manuscript is to systematically review the published literature regarding the application of various AI systems for the assessment of metastases in LNs in WSIs. In the next sections of the paper, the search strategy, along with the main characteristics of the retrieved studies, are, respectively, reported in the Material and Methods and the Results. Hence, the included papers are further analyzed in specific paragraphs of the Discussion section according to the following fields of application: (i) the value of AI in lymph node metastases of breast cancer, (ii) the results of public challenges employing and comparing different AI tools, (iii) the value of AI in lymph node metastases detection during intraoperative consultation, and (iv) the role of AI in identifying nodal metastases of tumors apart from breast cancer.

Materials and Methods
A systematic review of the literature was conducted, without language restrictions, according to the guideline for Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) [11] and Meta-Analysis of Observational Studies in Epidemiology (MOOSE) [12]. The databases, Pubmed and Embase, were systematically searched until August 2022 to identify any study regarding the application of image analysis and/or AI for the detection of lymph node metastases. The search strategy comprised a combination of terms including "image and analysis", "artificial and intelligence", "morphometry", "histomorphometric", "neural network", "convolutional", "computational", "deep learning", "automated", "machine learning" "lymph node", "metastasis", "WSI", and their spelling variations. The complete search strategy for these databases is detailed in Table S1. Two authors reviewed all article titles and abstracts with the aid of Rayyan QCRI reference manager web application [13]. Eligibility of published studies was determined independently by two reviewers with disagreement resolved through consensus. Full texts of the articles fulfilling the initial screening criteria were acquired and reviewed. Inclusion criteria encompassed the application of any kind of AI as a tool for automatic detection of metastasis in LNs. Only full texts of the articles fulfilling the initial screening criteria were acquired and reviewed. Any disagreement with respect to inclusion of a particular article was resolved by consensus.
Studies represented only by abstracts were excluded, as well as reviews and published letters to the editor with no original data. Two investigators independently extracted data from the included studies with a standardized form. Data extracted included author(s), publication year, country of origin for the research, type of metastatic cancer, type of AI employed, main results, and limitations of the study. As for the performances of the AI systems, three cut-offs were chosen based on whether the precision, accuracy, sensitivity, specificity, or area under the curve (AUC) of the receiver operating curves (ROC) was higher or lower than 95%. This threshold was chosen because it is equivalent to two standard deviations (σ). A shade of green was assigned according to the results achieved by the algorithm proposed by the authors. As the color gradation intensifies, the accuracy of AI for the identification of LN metastases increases as follows: (i) light green for every parameter <95%, (ii) medium colored green for at least one parameter >95%, and (iii) dark green: all parameters >95%.

Results
A flow diagram of the screening and exclusion of all the articles is shown in Figure 2. their spelling variations. The complete search strategy for these databases is detailed in Table S1. Two authors reviewed all article titles and abstracts with the aid of Rayyan QCRI reference manager web application [13]. Eligibility of published studies was determined independently by two reviewers with disagreement resolved through consensus. Full texts of the articles fulfilling the initial screening criteria were acquired and reviewed. Inclusion criteria encompassed the application of any kind of AI as a tool for automatic detection of metastasis in LNs. Only full texts of the articles fulfilling the initial screening criteria were acquired and reviewed. Any disagreement with respect to inclusion of a particular article was resolved by consensus. Studies represented only by abstracts were excluded, as well as reviews and published letters to the editor with no original data. Two investigators independently extracted data from the included studies with a standardized form. Data extracted included author(s), publication year, country of origin for the research, type of metastatic cancer, type of AI employed, main results, and limitations of the study. As for the performances of the AI systems, three cut-offs were chosen based on whether the precision, accuracy, sensitivity, specificity, or area under the curve (AUC) of the receiver operating curves (ROC) was higher or lower than 95%. This threshold was chosen because it is equivalent to two standard deviations (σ). A shade of green was assigned according to the results achieved by the algorithm proposed by the authors. As the color gradation intensifies, the accuracy of AI for the identification of LN metastases increases as follows: (i) light green for every parameter <95%, (ii) medium colored green for at least one parameter >95%, and (iii) dark green: all parameters >95%.

Results
A flow diagram of the screening and exclusion of all the articles is shown in Figure  2.  In more than half of the included studies (16/23, 70%), the investigated LNs were from breast cancer series. Nineteen (19/23, 83%) of the included studies relied only on H&E stains, while three (3/23, 13%) of them used only IHC for cytokeratin and one study coupled H&E with deep UV excitation fluorescence microscopy [14]. As for the employed AI technology, it is worth mentioning that three studies (3/23, 13%) employed a combination (ensemble) of DL algorithms, ranging from four to thirty-seven, for improving the evaluation of breast cancer metastases in LNs. For these reasons, the authors divided socalled 'challenge' studies in a different table (Table 2). DL-based computational pathology approaches required either manual annotation of regions of interest (ROI) on WSIs in fully supervised settings or large datasets with slide-level labels in a weakly supervised setting. Both methods required a training dataset. Nineteen (19/23, 87%) of the included studies used a fully supervised approach, while the remaining four (4/23, 13%) algorithms utilized weak supervision [15][16][17][18]. Most of the studies (21/23, 91%) had a WSI training set, whereas two (1/23, 4%) [19,20] studies did not employ a training set as the algorithm was directly able to recognize neoplastic cells stained by the IHC cytokeratin assay. A WSI validation (hold-out) data set was available in three of the papers (3/23 13%). The number of slides used for the training set ranged from 36 to 1963. In the weakly supervised setting, the number of slides of the training set was much higher than in algorithms that relied upon pixel-level annotations.
In terms of AI efficiency, 11 (11/20, 55%) algorithms achieved a performance of >95% across all parameters (precision, accuracy, sensitivity, specificity, or area under the curve of the receiver operating curves) with all four of the weekly supervised studies reaching this goal. As for the detection of isolated tumor cells (ITC), the 95% cut-off was almost never reached; rather, for these settings the AUC rates ranged from 0.575 [21] to 0.9228 [18]. Only one study reached 100% detection rate for ITC, but with the cost of 0% specificity [22]. Among the different employed AI systems, three (3/20, 15%) were programmed to recognize IHC cytokeratin stains, while the others analyzed only H&E slides. Twelve of the latter studies (12/20, 60%) used Convolution Neural Network (CNN) algorithms built on pre-existing platforms, including Googlenet, AlexNet-GRU, Resnet, Densenet, MobileNetV2, and LYNA, among others, and five (5/20, 25%) were ex novo designed algorithms. Finally, three of the included studies (3/20, 15%) simultaneously tested different AI algorithms utilizing the same dataset of H&E stained WSIs [14,23,24].

Discussion
AI development is progressing rapidly, with the introduction of several models that can perform various tasks such as the detection and the segmentation of various malignancies such as breast, pharynx, and thyroid carcinoma [27][28][29][30], and non-tumoral specimens, among others [31,32].
LN metastases are one of the most important prognostic factors for staging malignancies [2]. The histological evaluation of nodal specimens must be performed with care and precision. However, this work, conducted manually by pathologists, is often a protracted, tedious, and possibly error-prone process that could benefit from the aid of digital pathology and AI-based algorithms designed to assist with screening LNs for metastatic disease. Indeed, the application of digital techniques to help detect LN metastases may allow pathologists to reduce turn-around times and increase their diagnostic accuracy. Several AI-based tools have been developed in the last decades for addressing this relevant issue, which are further discussed in the following sections, according to their fields of application.

AI and Nodal Breast Cancer Metastases
The majority (70%) of the included studies in our systematic review focused on the evaluation of sentinel LNs in breast cancer patients. While localized breast cancer has a five-year survival rate of >95%, the presence of LN metastases drops the survival rate to 85% [33]. Based on the diameter of clusters of tumor cells, metastases can be divided into three categories: macrometastases, micrometastases, or isolated tumor cells (ITC), which reflect the "N" classification of breast cancer staging according to the eighth edition of the TNM staging criteria [2]. The biological significance of an ITC is debated and, according to the WHO, LNs just containing lTC are currently excluded from the total positive nodal count for the purposes of the N classification [33].
In 2003, Weaver et al. [19] were among the first investigators to use an automatic LN metastasis detection system on WSIs. Their system was based on a sensor capable of recognizing cells stained by IHC for cytokeratins. Those authors showed that their tool identified 19 of 20 (95.0%; 95% CI 75% to 100%) cases with micrometastases. In the only case where micrometastases were missed, the cancer cells were placed outside the physical limits of slide scanning for the instrument. It is also important to note that slides with excessive stain debris could not be analyzed by the system [19]. The use of IHC stained WSIs was reported by Clarke et al. [21] and Holten-Rossing et al. [20]. Clarke's [21] models reached sensitivities of 57.5% for ITCs (<0.200 mm), 89.5% for micrometastases, and 100% for larger metastases, while Holten-Rossing and colleagues [20] achieved a sensitivity of 100% without any false negative. Despite the advantage of this IHC-based method, it requires longer stain times and is subject to increased complexity. However, this method could be particularly useful in specific settings such as patients that underwent neoadjuvant chemotherapy where nodal tissue may have drug-induced changes or an inflammatory/fibrotic response.
Undoubtedly, a turning point in the research of AI-based tools for detecting LN metastases was in 2016, represented by the CAMELYON16 challenge (CancerMetastases in Lymph Nodes Challenge) [25]. As a result, several improvements in computer programming resulted in AI capable of analyzing large WSI files. Several AI-based algorithms were developed for detecting metastases of various tumors. Among these was an interesting algorithm owned by Google called LYNA (LYmph Node Assistant), which highlights areas suspicious for the presence of metastatic cells. LYNA was reported in two different articles [34,35]. Steiner et al. [34] designed a fully crossed, intermodal, multi-reader study to evaluate the performance metrics for both assisted and unassisted reads. The pathologists in this study interpreted all the images in both modalities, with or without assistance, in two sessions separated by a wash-out period of at least four weeks. The results stated that all pathologists performed better than the algorithm alone with regard to both sensitivity and specificity; however, when they reviewed the images with AI assistance, the average time of review per image was significantly shorter, especially in negative and micrometastatic LNs. Of note in the study by Steiner et al. is that the researchers also examined WSI for ITC. Liu et al. [35] reached similar results with the algorithm that performed best in the CAMELEON16 challenge (slide level AUC 99.3 vs. 99.4). Relying on the LYNA, through an exhaustive screen for each slide at a high-power magnification, the authors propose the application of this AI in screening LNs highlighting ROIs. In a second phase, these areas could be evaluated by single pathologists, ignoring false positives, and interpreting only the true positive regions. Another algorithm proposed by Khalil et al. [36] took between 2.4 and 9.6 min per WSI to detect metastasis depending on the amount of the graphics processing unit (GPU) used.
As highlighted in Table 1, most of the algorithms in our review appeared to struggle with the task of detecting isolated tumor cells. In order to achieve high sensitivity for small ITCs, AI-based digital tools likely have to allow for a higher rate of false positive results. Indeed, examples of misdiagnosed tissue as micrometastasis could be hypertrophic lymphoid follicles, reactive venules or capillaries, and macrophages. Non-histological pollutants such as paraffin debris, bubbles, or stains can also be incorrectly interpreted as a metastasis by the algorithm. Such errors, however, can be addressed by incorporating oversight of the results by an experienced pathologist. For discrepant cases, additional H&E-stained sections or IHC can be performed. In the future, novel technology such as non-destructive 3D pathology may be used and coupled with the application of AI [37].

Public Challenges
One mechanism that has facilitated the development of AI algorithms has been through public challenges for specific tasks. Since the first challenges in 2007, the number of challenges per year has steadily been increasing [38]. As noted, a key turning point in this field occurred due to the CAMELYON16 challenge [25] where humans were compared with AI models to detect LN metastases by measuring the time required for reaching a correct diagnosis. The algorithms developed for this public challenge performed better than the 11 involved pathologists in identifying micrometastases. For example, when there was a time limit imposed for the detection of a tumor cell cluster of a diameter 0.2 to <2 mm, the AUC for the best algorithm was 0.994 (95% CI, 0.983-0.999) versus a mean AUC for the pathologists of 0.810 (range, 0.738-0.884; p < 0.001). Nevertheless, the AI models did not surpass the pathologists if no time restriction was applied (AUC = 0.943). A limitation of the CAMELYON16 competition was that WSI with ITCs were not provided. Further efforts were made with the CAMELYON17 challenge [26]. For this subsequent public challenge, the dataset of slides was divided into 100 artificial patients representing different pN stages for assessing the ability of the participating algorithms to perform automatic pN staging and getting closer to a real-world simulation. In general, the AI systems created were not only able to detect the presence of metastases, but were also able to measure their extent, including ITC, and hence better determine an accurate pN-stage.
However, even the best combination of algorithms only correctly classified 77% of patients at the slide level, where the best ranked team wrongly classified 13,4% of the slides in the test set. Overall, ten slides containing micrometastases and four slides containing macrometastases were missed [26], which would of course be unacceptable in clinical practice. One putative strategy to overcome such a problem would be to increase the sensitivity of these AI systems, even though this could increase the number of false positives. These early results imply that perhaps algorithms developed for research purposes to automatically detect LN metatastatic disease are not yet ready to be fully adopted in daily practice.

Intraoperative Consultation
Only a few studies investigated the implementation of AI to assist with frozen sections. The intraoperative evaluation of a sentinel LN is particularly demanding in this clinical setting, even for an experienced pathologist, because of artifacts such as tissue compression, nuclear ice crystals, sections with folds, and stain nuances which differ from formalinfixed paraffin-embedded (FFPE) material, overall leading to inferior image quality [39]. Kim et al. [40,41] proposed a transfer learning to effectively train their CNN model for the identification of metastatic breast cancer cells on frozen tissue section digital slides. Transfer learning relies on the re-utilization of a model trained on one task to a second, related task by adding modifications. The authors exploited data of annotated WSIs from FFPE samples to train CNNs working on frozen sections. The best algorithms detected metastasis with an AUC of 0.805 and a processing time of 10.8 min. In conventional (human) frozen section examinations, the time between sample receipt and a rendered pathological diagnosis typically spans from 20 to 30 min, including gross specimen inspection, tissue freezing, sectioning, staining, and microscopic examination [42]. For a digital workflow, the time for scanning frozen sections may vary depending on the size, type of scanning machine, magnification, and focus z-stacking, but it generally ranges from 3 to 9 min [42,43]. Therefore, the application of an efficient AI system in the intraoperative consultation setting, despite increased diagnostic times, could be useful in particularly demanding cases, especially when a second opinion by a remotely located colleague is required. For these reasons, this technology is likely suitable for use in routine practice.

Tumors Other Than Breast Cancer
Our review identified seven studies where metastatic disease in LNs was from nonbreast cancer series including gastric cancer (3/7), squamous carcinoma (2/7), colorectal carcinoma (1), and lung cancer. For gastric cancer metastatic LN series, Matsumoto et al. [14] combined H&E with deep UV excitation fluorescence microscopy and tested different AI models on both types of images acquired. Their results were excellent (AUC = 98.8) and these authors also demonstrated that automated analysis with fluorescence images achieved rates of detection of LN metastasis as accurately as that with H&E images. In the 222 patient cohort of Hu et al. [23], 51 were treated with neoadjuvant chemotherapy (NACT). In this particular study, these researchers demonstrated that their AI model was effective and can accordingly be confidently used for LN screening after NACT. Huang et al. obtained a slide-level [18] AUC curve of 0.9936 for gastric LN metastasis using a weakly supervised algorithm, based on ResnNet50 architecture. Moreover, their proposed method significantly enhanced the sensitivity of ITC recognition and micrometastases identification while shortening the review time per slide. In 2020, Pan et al. [44] employed an algorithm developed on a slide set of metastatic esophageal squamous cell carcinoma to screen LNs suspicious for metastases from the throat and lung. By relying on transfer learning, the applied AI tool reached an accuracy of 96.7% and 90%, respectively, for each type of cancer. Nodal metastases from head and neck squamous cell carcinoma cases were also tested by the algorithm developed by Tang et al. [24] in 2020, gaining ever higher sensitivity rates (100%), but with less specificity (75.9%). Chuang et al. [16], exploiting a ResNet50 model, built a weakly supervised algorithm that performed well in identifying both macro-and micrometastases from colorectal cancer with an AUC of 0.9993 and 0.9956, respectively, at the slide level. However, when focusing on ITC, these values dropped to 0.7828. Finally, in 2019, Pham et al. [22] employed an AI tool called HALO to recognize metastases of nearly all subtypes of lung carcinoma (excluding small cell lung cancer), achieving very high sensitivity rates but with lower specificity.

Limitations
One key limitation of all the herein studied AI systems is that they were designed to detect just one main pathological lesion (i.e., the detection of metastatic cells), so that they were unable to recognize other rare, but still relevant, key histological features such as co-occurring pathologies (e.g., lymphoma or infection) involving LNs. Secondly, the reported AI tools frequently faced difficulties when metastatic foci were particularly small in size (e.g., ITCs). Finally, DL-based algorithms were often expensive to develop and deploy, therefore hindering the widespread use of this technology.

Conclusions
The published data overall indicate that the application of AI in detecting LN metastasis, with due care, is feasible for routine clinical practice in the near future. Ideally, AI-based automated analysis of LNs would assist pathologists by screening these samples and thereby augment diagnostic pathology reporting and tumor staging. A high sensitivity rate is ideally required for these novel AI systems to reach this goal. Further studies are warranted to improve the performance and workflow of this promising technology, in order to validate their adoption in the routine workflow of pathology laboratories.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/cancers15092491/s1, Table S1: Search strategy for electronic databases. Institutional Review Board Statement: Ethical review and approval were waived for this study as no ethical issue was raised by reviews.