Noninvasive Technologies for the Diagnosis of Squamous Cell Carcinoma: A Systematic Review and Meta-Analysis

Early cutaneous squamous cell carcinoma (cSCC) diagnosis is essential to initiate adequate targeted treatment. Noninvasive diagnostic technologies could overcome the need of multiple biopsies and reduce tumor recurrence. To assess performance of noninvasive technologies for cSCC diagnostics, 947 relevant records were identified through a systematic literature search. Among the 15 selected studies within this systematic review, 7 were included in the meta-analysis, comprising of 1144 patients, 224 cSCC lesions, and 1729 clinical diagnoses. Overall, the sensitivity values are 92% (95% confidence interval [CI] = 86.6–96.4%) for high-frequency ultrasound, 75% (95% CI = 65.7–86.2%) for optical coherence tomography, and 63% (95% CI = 51.3–69.1%) for reflectance confocal microscopy. The overall specificity values are 88% (95% CI = 82.7–92.5%), 95% (95% CI = 92.7–97.3%), and 96% (95% CI = 94.8–97.4%), respectively. Physician’s expertise is key for high diagnostic performance of investigated devices. This can be justified by the provision of additional tissue information, which requires physician interpretation, despite insufficient standardized diagnostic criteria. Furthermore, few deep learning studies were identified. Thus, integration of deep learning into the investigated devices is a potential investigating field in cSCC diagnosis.


INTRODUCTION
Cutaneous squamous cell carcinoma (cSCC) is the second most common form of nonmelanoma skin cancer (NMSC).Globally, North America holds the second highest incidence rate after Australia and New Zealand (Sung et al, 2021), with around 1 million surgical procedures performed to treat cSCC in 2012 in the United States alone (Rogers et al, 2015).Critically, the incidence rate is expected to increase by approximately 40% in Europe and 90% worldwide by 2040 (Ferlay et al., 2023).
Although cSCC is not often deadly, it can cause severe morbidity.The majority of cSCC is located in the head and neck area, and substantial excision is needed to treat the disease at a late stage, which can result in deformity (Pellacani and Argenziano, 2022).When surgical excision is not possible, immunotherapy and chemotherapy are possible treatment choices.It may be challenging to distinguish between different actinic keratosis (AK) grades, in situ cSCC/ Bowen's disease (BD), or even invasive squamous cell carcinoma (iSCC), but doing so is crucial for choosing the right course of treatment and avoiding multiple biopsies (Lomas et al, 2012;Ulrich et al, 2016).
Noninvasive medical devices emerged in the 1980s to improve skin cancer diagnostics in routine dermatology practice (Guida et al, 2019).Because of the advancement in dermoscopy, diagnoses have become more precise (Pellacani and Argenziano, 2022).Recently, sophisticated techniques have been introduced in clinical practice, including optical coherence tomography (OCT), reflectance confocal microscopy (RCM), and high-frequency ultrasound (HFUS).They differ in terms of resolution; penetration depth; and, therefore, clinical applicability (Marneffe et al, 2016).Although these methods surpass dermoscopy in their capabilities, they necessitate further expertise and have yet to attain widespread adoption.
Despite noninvasive technologies predominantly being developed for melanoma diagnostics, their gradual advancements have also begun to explore keratinocyte carcinomas, highlighting their potential relevance in clinical practice (Fink and Haenssle, 2017).Therefore, we conducted an extensive systematic review and meta-analysis of studies that investigate the use of noninvasive technologies in the diagnosis of cSCC.Subsequently, sensitivity and specificity of diagnostic tests were assessed, and latest advances in this area are reported.

RESULTS
A total of 947 records were identified as relevant through database search.After removing 168 duplicates, 548 records were excluded during title screening.Using Covidence, 231 records were screened for abstract, and subsequently, 77 records were selected for full-text assessment.Almost half of the exclusions were based on not reported or insufficient data for 2 Â 2 contingency table as shown in Figure 1.

Studies' characteristics
Noninvasive diagnostic technologies such as photo-acoustic, fluorescence, and hyperspectral imaging are reported in experimental or pilot studies.Most were initially developed for melanoma diagnosis and subsequently used to explore NMSC and premalignant and benign skin lesions detection and tumor surgical margin control.Full list of methods found during screening is provided in Table 1.
Nevertheless, for the meta-analysis, 8 studies were excluded.Two studies (Boone et al, 2016(Boone et al, , 2015) ) were CN Garcia et al.
Noninvasive Squamous Cell Carcinoma Diagnostics excluded owing to lack of test set; only information about training set is reported.Authors were contacted; however, they failed to respond.Three studies (Dı ´az et al, 2019;Feng et al, 2018;Han et al, 2018) were excluded owing to paucity of data (<2 studies of each modality).Dı ´az et al (2019) investigated multimodal imaging using an automatic procedure for early skin-cancer screening by dynamic thermal imaging.Feng et al (2018) investigated Raman spectroscopy, and Han et al (2018) is the study selected in the review that investigated computer-assisted diagnosis (CAD) systems, more specifically, image classification of 12 skin diseases using CNN.
Subsequently, 2 studies (Rodriguez-Diaz et al, 2019;Silveira et al, 2020) were excluded owing to intense variability of defined spectra and methods, which would make a group under optical spectroscopy term strongly heterogeneous and inaccurate.Finally, Longo et al., (2013) considered the endpoint as excision or no excision instead of classifying according to the specific type of tumor and, therefore, was also excluded.
The meta-analysis therefore evaluates the sensitivity and specificity of OCT/HD-OCT, RCM, and HFUS with Doppler for cSCC diagnosis in 7 studies with 7 cohorts, corresponding to 1144 patients and 224 cSCCs.Considering that 2 studies (Marneffe et al, 2016;Rao et al, 2013) addressed evaluation by more than 1 clinician, in total, 1729 clinical diagnoses are reported.Cohort description is presented in Table 3.

Quality assessment
A summary of the overall methodological caliber of all included study cohorts is introduced (Figure 2).According to reference standard and flow and timing, studies mostly  present low or unclear risk of bias.Selective participant recruitment (5 studies), ambiguous reference test blinding (6 studies), and exclusions brought on by low picture quality and large tumor thickness are observed.Index test interpretation features mostly with low risk of bias (7 studies) because it was frequently done remotely using images, and the observer/reader was blind to any clinical information that would ordinarily be available in practice.In summary, all included studies stated blindness of examiners to clinical and histopathological information with exception of Jerjes et al (2021).Physicians also do not overlap between studies.At last, the application of the findings is not a major concern with regard to the concordance of the research question and the selected studies.

Findings
Studies included in the meta-analysis are clustered in 3 groups: OCT, RCM, and HFUS.These technologies are dependent on physicians to interpret the produced images.Their accuracy varies according to diagnostic criteria and physicians' experience (Table 4).Consequently, for means of statistical analysis, cohorts that are evaluated by >1 physician (Marneffe et al, 2016;Rao et al, 2013) are separately assessed (Figure 3).

DISCUSSION
In this meta-analysis, the findings involve 3 categories of noninvasive diagnostic tools of cSCC.with trained dermatologists, radiologists, or consensus in diagnosis achieve the highest sensitivity and specificity regardless of the degree of suspicion of included lesions.The majority of the studies encompass the clinical practice reality by examining malignant and benign skin lesions, with a particular focus on the diagnosis of malignant melanoma and BCC.In addition, these studies simultaneously investigate the differential diagnoses within the spectrum of cSCC, which includes AK, BD, and keratoacanthoma.Dinnes et al (2018b) investigated the accuracy of visual inspection and in-person dermoscopy.They reported polled sensitivity and specificity of 57% (95% CI ¼ 53e61%) and 79% (95% CI ¼ 77e81%) for visual inspection, respectively, whereas pooled sensitivity and specificity for inperson dermoscopy was 55% (95% CI ¼ 29e79%) and 84% (95% CI ¼ 32e98%), respectively.However, the paucity of studies led them to state that a reliable conclusion could not be made.
In the OCT group, sensitivity and specificity range from 43.8 to 96.0% and from 90.0 to 98.9%, respectively.The highest accuracy is achieved by previously trained dermatologists.Variations of OCT for skin imaging found in this review such as high definition, full field, line field, dynamic, and single fiber provide different degrees of tissue penetration, resolution, 2-dimensional or 3-dimensional images, information on vascular structures, and tissue coordinates according to probe characteristics (Chuchvara et al, 2021).To improve quantitative evaluation, Berlin score was initially developed for BCC diagnostics.However, it still needs refinement and has not yet been validated (Wahrlich et al, 2015).The only other approach along these lines is by Marneffe et al (2016), in which applied nonvalidated diagnostic criteria did not reduce accuracy variation.
This review also revealed a higher number of studies that deployed bedside diagnostic devices, possibly because of insurance coverage, since RCM is reimbursed as an additional diagnostic method in the United States (Centers for Medicare and Medicaid Services, 2017).Consequently, this financial provision could therefore lead to wide clinical adoption.
The experience of the physicians in the RCM studies does not clearly correlate with high accuracy, possibly owing to discrepancy in sample size and absence of information on experience.Rao et al (2013) presented discordance between experience and specificity, with least experienced physician presenting higher specificity.This could in principle be explained by the different sizes of samples analyzed.However, of 334 lesions, the first physician reported by Rao et al (2013) evaluated 317, whereas the second physician (Rao et al, 2013) evaluated 323, a small discrepancy that is not enough to justify such finding.We also observed that the first physician (Rao et al, 2013) tended to diagnose more BCC instead of cSCC, a common differential diagnosis that could have contributed to a high specificity.When analyzing both metrics, the first physician (Rao et al, 2013) had a lower accuracy than the second.A previous review's findings presented limited data on the target condition but suggested sensitivity in the range of 74.0e77.0%with high specificity of 92.0e98.0%(Dinnes et al, 2018c).
HFUS with color Doppler presented convergent accuracy in both studies, probably owing to similarity in sample size and described features.On the one hand, Chen et al (2022) developed a model on the basis of ultrasound features with additional Doppler information, whereas sample selection was restricted to high-risk BCC and SCC.On the other hand, Zhu et al (2021) demonstrated accuracy on the basis of radiologist agreement.Samples included the continuum AK, BD, and iSCC (Dinnes et al, 2018a).
A comparison of technical specifications between the technologies investigated was compiled.Features include device system, wavelength, optical lateral and axial resolution, image resolution, penetration depth, field of view, and approximate imaging time (Table 5) (Agfa HealthCare NV, 2014; Cao and Tey, 2015;Chen et al, 2022;Mavig, 2018).
Previously, Malvehy et al (2014) investigated Nevisense system performance and achieved sensitivity of 100.0% and specificity of 43.4% for cSCC diagnosis.These findings could be justified by a melanoma-focused recruitment with few differential cSCC diagnoses.This system is however trained for detecting pigmented lesions, which is not cSCC's classical clinical presentation.
Given the recent advancements of integration of deep learning into CAD systems in melanoma diagnosis (Ferrante di Ruffano et al, 2018), we expected to find more substantial research also targeting cSCC.The only study (Han et al, 2018) found in this review that investigates CNN reports a specificity range of 82.0e90.2% and sensitivity of 74.3e80.0% in 2 different test sets.However, endpoint diagnosis was benign versus malignant.
The constraint of identified deep learning studies could reflect in the numerous challenges in its adoption (Chan et al, 2020).On one hand, compared with deployment of investigated devices within existing clinical workflows, the latter are already being used for equivocal lesions' diagnostics.On the other hand, integration of deep learning into the investigated devices lacks substantial scientific publication in cSCC diagnostics.
To improve future studies' design and value, we suggest the evaluation of at least 2 observers/readers with different experience levels, inclusion of visual inspection and/or dermoscopy as a control group, focus specifically on cSCC spectrum, and inclusion of patients in the field of cancerization and those immunosuppressed.A minimum of 5 target lesions is essential for an adequate 2 Â 2 contingency table.Finally, the development and refinement of diagnostic criteria contributes to standardization of accuracy studies.An overview of features visualized with those technologies was performed by Danescu et al (2024) and could serve as guidance for future studies.

Strengths and limitations
This review outlines the state of the art on cSCC noninvasive diagnostic tools.To estimate test accuracy across a variety of research populations, a thorough and repeatable investigation of methodological quality was conducted.Furthermore, a literature gap in cSCC diagnostics when concerning CAD systems was identified.
The review's key issues are first related to the inadequate reporting of primary studies-in the case of diagnostic accuracy research, this means reporting according to STARD (Standards for Reporting Diagnostic Accuracy Studies) 2015.Second, the review key issue is to the fact that not all the records were intended to be test-accuracy studies.Heterogeneous diagnostic criteria and the use of different thresholds also led to paucity of selected studies.Most of the studies are focused on pigmented lesions, leading to a limited sample size of the target condition.Intensely hyperkeratotic lesions are not considered in this review.
Third, studies with a low amount of cSCC have a lower chance for false-positive predictions; those with low frequencies have lower chances of false negatives.Therefore, we calculated pretest probability (Table 6) and found it to be highest in studies in the HFUS group.Because it positively influences the predicted metrics, the results produced should thus be interpreted with caution.

Applicability of findings to the review question
Owing to a large array of diagnostic processes as well as in the type and performance of the equipment used and the small number of selected studies, our results may not offer sufficient data for statistical generalization.The review and meta-analysis methods are reproducible; however, the investigated studies produce metrics that might not be reproducible owing to specific examiners' skills and particular device model.For a study comparing OCT, RCM, and HFUS within the same cohort, physicians with equivalent experience and homogeneous diagnostic criteria would be pertinent to give a general assessment about their performance.

CONCLUSIONS
In this systematic review and meta-analysis, we found noninvasive skin cancer diagnostic tools accuracy to have a strong reliance on consensus between trained professionals as well as physician's interpretation experience with an extensive learning curve.To overcome expert dependency while improving test accuracy, we suggest further research on noninvasive cSCC detection with CAD systems and their incorporation to bedside diagnostic tools.Research targeting noninvasive tools for the detection of cSCC would substantially benefit from standardization of diagnostic criteria and adherence to reporting guidelines.

MATERIALS AND METHODS
This systematic review and meta-analysis follow PRISMA (Preferred Reporting Items for Systematic Review and Meta-Analysis) reporting guideline.To prevent data-driven analysis, it was registered with PROSPERO (International Prospective Register of Systematic Reviews) before the start of data collection.

Search strategy and selection process
Six online scientific research databases (PubMed Central, Scopus, Web of Science, Google Scholar, IEEE Xplore, and SciELO) were searched from January to March 2023 for studies published in English since 2013 and that investigated adult human subject.The search strategy was designed in close collaboration with an experienced biomedical librarian (Supplementary Materials and Methods).

Eligibility criteria
Studies were eligible for inclusion if they referred to a prospective or retrospective cohort of adult patients diagnosed with cSCC.The index test had to be performed in vivo before incisional or excisional biopsy was executed.Histopathology was considered as reference standard test.Studies that only reported contingency data on the basis of malignant versus benign classification were not considered (Table 7).

Data extraction and quality assessment
Two independent reviewers screened records using Covidence and extracted data using a prespecified, customized form (Table 2).Discrepancies were resolved through mutual discussion.In case of relevant missing information, the corresponding author of the respective study was contacted.The included studies were assessed for methodological quality using STARD 2015 and the QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies 2) tool to evaluate the risk of bias and applicability (Tables 8 and 9).

Outcomes
Our primary outcome of interest was sensitivity and specificity of diagnostic devices used for cSCC classification such as OCT, RCM, and HFUS per group as well as per physician within each group.The  secondary outcome was to report currently used technologies in experimental phase or published in pilot studies.

Statistical analysis
Statistical analysis was performed using R, version 4.1.2,using the Rpackage boot, version 1.3-28, as well as the base package stats.We calculated sensitivity and specificity on the basis of the 2 Â 2 contingency tables reported in the studies.CIs for sensitivity and specificity were derived using the bootstrap method.For the method-wise sensitivity and specificity, we used the microaveraging approach, summing up all true positives, true negatives, false positives, and false negatives per non-nvasive method along all studies ending up in one combined sensitivity as well as specificity value per approach.The summary receiver operating characteristic curve was derived using the method suggested by Moses et al (1993).The Rpackage forestplot, version 3.1.1,was used to create the forest plot.The summary receiver operating characteristic plot was created using R library ggplot2, version 2.4.4.

ETHICS STATEMENT
This study used already published data and did not directly involve human or animal subjects.The study was registered with PROSPERO (CRD420233 92834)

Figure 1 .
Figure 1.PRISMA diagram for study selection.Provided is the workflow of study filter and selection (Page et al, 2021).PRISMA, Preferred Reporting Items for Systematic Review and Meta-Analysis.

Figure 2 .
Figure 2. QUADAS-2 risk of bias and applicability.Assessment of risk of bias (left) and applicability (right) according to 4 criteria are shown: flow and timing (dependent on study design; therefore, it is not evaluated for applicability), reference standard test, index test, and patient selection.The number of studies with low, high, or unclear bias can be seen in the respective colored bars.QUADAS-2, Quality Assessment of Diagnostic Accuracy Studies 2.

Figure 3 .
Figure 3. Forest plot of sensitivity and specificity of noninvasive methods.Metrics are shown for the 3 selected groups: OCT, RCM, and HFUS.Cohorts that were evaluated by more than 1 physician are represented by first author plus a, b, or c.CI, confidence interval; FN, false negative; FP, false positive; HFUS, highfrequency ultrasound; OCT, optical coherence tomography; RCM, reflectance confocal microscopy; TN, true negative; TP, true positive.

Figure 4 .
Figure 4. Summary plot of overall sensitivity/specificity per method and per physician.Individual studies/ physicians are represented by dots.Overall sensitivity and specificity per method (aggregated) are represented by crosses.The size of dots and crosses varies according to sample size.HFUS, high-frequency ultrasound; OCT, optical coherence tomography; RCM, reflectance confocal microscopy.

Table 2 .
Extracted Data Systematic Review CN Garcia et al.

Table 5 .
Technical Specifications of Noninvasive Technologies

Table 6 .
Rao et al (2013)ityAbbreviations: AK, actinic keratosis; BCC, basal cell carcinoma; BD, Bowen's disease; cSCC, cutaneous squamous cell carcinoma; KA, keratoacanthoma; LP, lichen planus; SK, seborrheic keratosis; SL, solar lentigo.The appended letters a, b, and c correspond to different observers/readers/examiners in the same study.Rao et al (2013)reported that examiners a and b evaluated different numbers of total lesions but the same number of cSCC.

Table 8 .
Risk of Bias Assessment with QUADAS-2