Tumor detection of the thyroid and salivary glands using hyperspectral imaging and deep learning

: The performance of hyperspectral imaging (HSI) for tumor detection is investigated in ex-vivo specimens from the thyroid ( N = 200) and salivary glands ( N = 16) from 82 patients. Tissues were imaged with HSI in broadband reﬂectance and autoﬂuorescence modes. For comparison, the tissues were imaged with two ﬂuorescent dyes. Additionally, HSI was used to synthesize three-band RGB multiplex images to represent the human-eye response and Gaussian RGBs, which are referred to as HSI-synthesized RGB images. Using histological ground truths, deep learning algorithms were developed for tumor detection. For the classiﬁcation of thyroid tumors, HSI-synthesized RGB images achieved the best performance with an AUC score of 0.90. In salivary glands, HSI had the best performance with 0.92 AUC score. This study demonstrates that HSI could aid surgeons and pathologists in detecting tumors of the thyroid and salivary glands. and autoﬂuorescence studies and determine the speciﬁc beneﬁts that HSI may oﬀer for tumor detection in thyroid and salivary gland tissues.


Introduction
Thyroid cancer incidence has significantly increased worldwide from 1970 to 2012, despite the fact that mortality from thyroid cancer has decreased [1]. Surgery is the standard treatment for thyroid cancers, and the 5-year survival rate for localized or regional thyroid cancers (excluding anaplastic variant) is above 90% [2]. The most common malignant tumor of the thyroid is papillary thyroid carcinoma (PTC), comprising 70% of thyroid cancers, and there are several variants of PTC, including conventional, follicular, tall-cell, and oncocytic [3]. The initial diagnosis of thyroid tumors is with fine-needle aspiration (FNA) biopsy and histological evaluation of the specimen [3]. Follicular tumors are another cytological type of thyroid neoplasms, which include follicular adenoma, a benign tumor, and follicular thyroid carcinoma (FTC), the malignant form. The requisite diagnostic criterion for follicular carcinoma versus adenoma is definitive invasiveness; no cytological features can provide the diagnosis of FTC, so FNA is therefore useless is making the distinction [4]. Medullary thyroid carcinoma (MTC) is a rare form of thyroid cancer, comprising only 4% of thyroid cancers, that occurs sporadically in most cases, but can be associated with a familial germline mutation [5].
During thyroid tumor resections, intraoperative frozen section (FS) analysis and pathologist consultation can be useful for determining extent of the disease and, according to recent American guidelines, may occasionally confirm malignancy and escalate treatment from partial to total thyroidectomy [6]. For example, in thyroid tumors, 15-30% of preoperative FNA biopsies may be indeterminate [7]. It remains controversial in thyroid tumor surgeries whether the practice of intraoperative FS can provide relevant diagnostic information, as it can be prone to misdiagnosis [7]. In 4% cases with benign intraoperative FS reports, clinically significant malignancy was found, compared to 6.8% in cases where no FS was performed [8]. This translates to a sensitivity of 22% for identifying malignancy in patients with benign FNA [8]. In the literature, it is suggested that the practice of intraoperative FS may lead to over or under treatment of thyroid tumors [7].
Salivary tumors involve the salivary glands, which are a system of exocrine glands in the mouth that produce saliva to initiate digestion. The major salivary glands are the parotid, the submandibular, and the sublingual salivary glands [9]. The classification of benign and malignant salivary tumors is complex, with over 20 distinct entities according to the most recent standard proposed by the World Health Organization [9][10][11]. Overall, more than 80% of primary tumors of the salivary glands arise in the parotid gland, which is the largest salivary gland [9,11]. Pleomorphic adenoma is the most common benign tumor of the salivary glands (60%) and typically occurs in the parotid glands [11]. Mucoepidermoid carcinoma is the most common malignant neoplasm of the parotid gland [9]. Adenoid cystic carcinoma is a malignant tumor that can occur with equal likelihood in the submandibular and parotid glands [9]. Polymorphous low-grade adenocarcinoma (PLGA) is a rare malignant tumor, commonly found in minor salivary glands of the hard or soft palate [12]. In surgical resection of salivary tumors, the sensitivity of intraoperative FS for detecting malignant parotid gland tumors with benign FNA was only 33%, suggesting difficulty in diagnosing low grade tumors [13]. Moreover, FS for salivary tumors carries the risk of tumor seeding and may not provide definitive diagnosis [14]. Nonetheless, the combination of preoperative FNA and intraoperative FS leads to high diagnostic accuracy overall for salivary tumors [15].
With the goal of image-guided surgery, hyperspectral imaging (HSI) is an emerging technology in biomedicine that has been used for cancer detection studies both ex-vivo and in-vivo [16][17][18]. HSI has been explored for brain cancer detection in-vivo [19,20]. Additionally, HSI has been proposed for laparoscopic cancer detection in colorectal surgeries with demonstrated potential [21,22]. The ability of HSI to identify ideal transection margins for colorectal tissues has been demonstrated after devascularization by blocking vascular anastomoses [23]. Our group reported HSI studies of head and neck cancer using ex-vivo human surgical specimens [24][25][26][27][28].
In order to leave the parathyroid glands intact during surgery, Barberio et al. demonstrated that HSI may be beneficial in detecting parathyroid glands from thyroid tissue during thyroidectomy [29]. For surgeries of salivary tumors, one challenge is leaving the facial nerve intact, which runs through the parotid gland and can cause facial paresis if injured. Wisotzky et al. showed that HSI can identify the facial nerve in the parotid gland [30]. The submandibular and sublingual salivary glands are surrounded by an anatomical variety of normal tissues in the oral cavity. Previous work from our group has demonstrated that HSI can distinguish amongst normal tissues in the oral cavity, such as stratified squamous epithelium, normal salivary gland, and skeletal muscle [27].
In this large study of 82 patients, we perform tumor detection in 200 thyroid tissue specimens from 76 patients in inter-patient testing experiments, and salivary gland tumor detection was investigated using 16 salivary gland tissue specimens from 6 patients. This is the most comprehensive study to date of tumor detection in thyroid and salivary glands to thoroughly assess the feasibility of label-free, non-contact, and non-ionizing HSI-based imaging modalities for computer aided tumor detection. The outcomes of this work will help guide future HSI and autofluorescence studies and determine the specific benefits that HSI may offer for tumor detection in thyroid and salivary gland tissues.

Methods
In this study, ex-vivo tissue specimens from the thyroid and salivary glands were imaged with optical imaging modalities; histological sections were prepared from the specimens for ground truths; patients were categorized and used to train, validate, and test deep learning algorithms; and performance was calculated to compare the methods.

Ex-vivo surgical specimen dataset
For this study, 216 surgical specimens were acquired from 82 patients undergoing routine resection of thyroid tumors or salivary gland tumors at the Emory University Hospital Midtown, who were recruited by giving written, informed consent to an institutional research coordinator. Table 1 shows the categorization of patients and tissue specimens. All patient data were de-identified by the research coordinator. The Institutional Review Board (IRB) of Emory University approved all research protocols and imaging methods. Three types of fresh, ex-vivo surgical specimens were obtained from the surgical pathology department during clinical service. We aimed to acquire a sample of normal tissue (N), tissue from the primary tumor (T), and a specimen of the tumor-involved margin that contains both tumor and normal tissue (TN), all of which were confirmed by histopathological analysis. The size of the tissue specimens was approximately 10×6×2 mm on average. Additionally, the final clinical pathology report was made available after de-identification. The tissue samples collected for this study were categorized by an experienced pathologist into six groups according to tumor subtype, divided into two broad cohorts: thyroid tumors and salivary gland tumors. The thyroid tumor cohort was comprised of 200 tissue specimens from 76 patients. The malignant tumors included in this cohort were PTC (N = 54), MTC (N = 5), insular carcinoma (N = 1), follicular carcinoma (N = 8), and poorly differentiated thyroid carcinoma (N = 3). The benign tumors of the thyroid were follicular adenoma (N = 5). The only thyroid cohort tissues excluded from this study were six patients with benign thyroid hyperplasia/goiter.
The cohort of salivary gland tumors was comprised of 16 tissue specimens from 6 patients. Two patients had benign pleomorphic adenoma (N = 2) of the parotid gland. Four patients had malignant tumors of the salivary glands: mucoepidermoid carcinoma (N = 1), salivary duct carcinoma of the parotid gland (N = 1), PLGA of the hard palate (N = 1), and adenoid cystic carcinoma (N = 1). The patient demographics and relevant cancer properties are shown in Table 2.

Optical imaging modalities
To assess the ability of HSI for tumor detection, several other optical imaging modalities were acquired for comparison, including both label-free and fluorescent dye-based methods. It was hypothesized that HSI would outperform fluorescence methods due to lack of sufficient target specificity in 2-NBDG and proflavin. In the following sections, the image acquisition systems are described for hyperspectral reflectance imaging, HSI-synthesized RGB multiplex imaging, autofluorescence imaging, and two fluorescent dye-based imaging techniques: 2-NBDG and proflavin.

Hyperspectral imaging
A CRi Maestro HS system (Perkin Elmer Inc., Waltham, Massachusetts) was used to acquire HSI of the ex-vivo specimens. The HS system performs spectral scanning from 450 to 900 nm using a Xenon light source and liquid crystal tunable filter (LCTF) with 5 nm spectral resolution [24,31]. The image size of the HSI was 1040×1392×91 pixels (height×width×spectral bands), and the corresponding specimen-level spatial resolution was 25 µm per pixel. Acquisition time for an HSI was approximately one minute. The raw HS data (I raw ) were normalized band-by-band (λ) by subtracting the inherent dark current of the sensor and dividing by a white reference disk for all pixels (x,y), according to the following equation.
The average spectral signatures after white-dark calibration are shown for all groups included in this paper by cohort in Fig. 1.

HSI-Synthesized RGB images
A multiplex image is a synthetic composite image generated from a hyperspectral image. For this work, several three-band (RGB) multiplex images were synthesized from the normalized reflectance HSI hypercubes. The first synthetic RGB was generated from the HSI by applying a Gaussian kernel in each color region, which is referred to as HSI-synthesized Gaussian RGB composite. The second RGB image was constructed from human color perception curves originally proposed by Judd et al. 1951 [32] and expanded by Vos 1978 [33]. For some tissues, a standard RGB image was also captured for comparison by an RGB camera. Figure 2 shows a representative surgical tissue specimen of thyroid cancer from an RGB-captured camera, HSI-synthesized Gaussian RGB multiplex, and HSI-synthesized RGB with human eye perception.
In this paper, we use HSI to simulate three-band images using RGB multiplex imaging. However, the sensors of RGB cameras typically employ the Bayer filter [34,35] for adapting RGB color spectrums similar to human-eye. While the spectrum of blue and green are typically consistent, different RGB camera sensor types have differing sensitivity to the red channel components between 400 and 500 nm [34,35]. Therefore, for a subset of thyroid tumor specimens, the component of red channel response between 400 and 500 nm was manipulated to simulate if this would have an effect on performance from the HSI-synthesized human eye RGB multiplex images.

Autofluorescence imaging
Autofluorescence imaging uses the emission from intrinsic fluorophores in biological tissue that are stimulated to fluoresce by external excitation light at specific wavelengths. The autofluorescence images were captured using a 455 nm excitation source and a 490 nm long-pass filter using the CRi Maestro imaging system. The long-pass filter removes any of the external light from the source that would be reflected into the image and allows capturing images of emission-only photons, according to Stokes' theorem. The autofluorescence images were acquired from 500 to 720 nm in 10 nm increments to produce a hypercube with 23 spectral bands and final size of 1040×1392×23 pixels.

2-NBDG imaging
A fluorescently tagged glucose molecule, 2-deoxy-2-[(7-nitro-2,1,3-benzoxadiazol-4-yl)amino]-D-glucose (2-NBDG), is a dye that targets cancer regions by producing a stronger signal measured from regions with higher metabolic glucose uptake. After the hyperspectral and autofluorescence imaging methods described above, the tissues were incubated for 20 minutes in a 160 µM 2-NBDG solution (Cayman Chemical, Ann Arbor, MI, USA) at 37 degrees Celsius, quickly rinsed in 1× phosphate buffered solution (PBS) to remove excess dye, and fluorescence imaging was performed using the CRi Maestro. The images were acquired with the same excitation light source at 455 nm and a long-pass filter at 490 nm from 500 to 720 nm in 10 nm increments, producing a hypercube that has 23 spectral bands.

Proflavin imaging
The second dye used for fluorescence imaging was proflavin dye, which is unaffected by previous 2-NBDG dye because it has a significantly stronger optical signal comparatively. Proflavin fluorescent dye binds to DNA and thus allows visualization of nuclear morphology, which can improve the ability of machine learning based cancer detection methods [36]. Keratin is also a target of proflavin dye, but this should not affect the glandular tissues involved in this study [36,37]. For proflavin imaging, the tissue samples were incubated for 2 minutes in a 0.01% proflavin solution (Sigma Aldrich, St. Louis, MO, USA) at room temperature, and the tissues were rinsed in PBS before imaging with the CRi Maestro. The images were acquired with an excitation light source at 455 nm and a long-pass filter at 490 nm from 500 to 720 nm in 10 nm increments, producing a hypercube that contains 23 spectral bands.

Histological ground truth
The ground truths for the optical imaging modalities were achieved using digitized histology imaging. After acquiring all images, the tissue specimens were inked at the top, bottom, left and right edges, and back surface of the tissue to identify tissue orientation in histological sections. Tissues were then fixed in formalin, paraffin embedded, and sectioned with a microtome, and 5 µm slices were made from the surface that was optically imaged. The first high quality slice was kept to serve as the histology ground truth, processed with hematoxylin and eosin staining, and digitized using whole-slide scanning at 40× objective [38]. A board-certified pathologist with expertise in head and neck pathology annotated the tumor and normal areas on the digital histology images.
A binary mask was made of the contoured digital histology images, which served as the ground truth for the optical imaging modalities. Due to tissue deformations during histological processing and slide preparation, the histology ground truth masks needed to be registered to the gross-level optical images. The digital histology slide was registered in a semi-automated method according to a previously established pipeline of affine, land-mark, and deformable registration to the gross-level HSI [39,40]. The transformation was applied to the binary histology mask, and thus a ground truth mask was generated matching the gross-level optical images of the tissue specimens.

Experimental design
The deep learning experimental designs involved training, validation, and testing. However, for the two cohorts, thyroid tumors (N = 76 patients) and salivary tumors (N = 6 patients), there were different designs of the data partitioning, which was required because of the significant differences in sample size. Therefore, the thyroid cohort is used to produce fully-independent inter-patient results. However, the salivary cohort is used to produce intra-patient training and testing, as described below in detail.

Thyroid tumors
Tumor detection of the thyroid gland was performed in fully-independent patients, divided across 5 folds. Each fold served as the fully-independent testing group, while training and validation was performed on the patients in the remaining 4 folds, as depicted in Fig. 3. This design was selected to allow test-level performance metrics for all 76 thyroid patients. Fig. 3. Schematic depicting the experimental design of fully-independent training, validation, and testing paradigms for the 76 patient thyroid tumor cohort.

Salivary gland tumors
Tumor detection in the limited sample size of salivary gland tumors was performed using intra-patient experiments. Training and validation was performed on the patients' primary tumor (T) and normal (N) tissues, and testing was performed on the tumor-normal (TN) margin tissue. Figure 4 shows a schematic diagram of the training and testing paradigm. The salivary gland cohort was separated into tumors of the parotid gland (N = 3 patients) and other salivary glands (N = 3), as shown in Table 1. Fig. 4. Flow diagram of intra-patient experiments of the salivary gland, with representative tumor of the parotid gland. Intra-patient T and N tissues were used for MLP (multilayer perceptron) training, and TN tissue specimens were used for testing. The histological ground truth is shown with tumor contour in green. The predicted tumor heat-map overlay onto the RGB image is shown with tumor predictions (red) and normal predictions (green). Areas of specular glare in the heat-map are not classified, and the ground-truth tumor contour is in blue.

Convolutional neural network
For thyroid tumor detection using 200 tissue specimens from 76 patients, a convolutional neural network (CNN) was developed for the effective classification of thyroid tissue into tumor and normal using a patch-based approach. The inception-v4 CNN architecture [41] was selected because it is one of the top performing CNNs on standard tasks like Image-Net, yet has a manageable number of hyperparameters. HSI data has several unique challenges due to data size. Therefore, the CNN required modification for HS data pre-processed into image-patches of size 25×25×C pixels, where C represents the number of spectral bands. The first convolutional layers were modified for the smaller patch-size necessitated by HS data, and the operating resolution in the modular inception blocks was reduced to allow more efficient training and classification using the CNN. Additionally, squeeze-and-excitation modules were added to increase the performance of the CNN [42]. The implemented CNN architecture schematic is detailed in Fig. 5.
Image patches (25×25×C) were generated using a sliding window approach with a stride of 13 pixels, and the data pre-processing of HSI was performed in MATLAB (MathWorks, Inc., Natrick, MA). All deep learning programming was done in the TensorFlow python software package [43] on an Ubuntu machine and accelerated with CUDA execution on Titan-XP NVIDIA GPUs (Nvidia Corp., Santa Clara, CA). The CNN loss function was cross-entropy, the optimizer utilized was Adadelta with an initial learning rate of 1.0, and validation performance was calculated every 2 epochs of training data. Training each CNN model was performed for 14 epochs of 8× augmented (reflections and rotations) training data, which took about 23 hours to train. Deployment of a fully-trained CNN model on a single GPU to classify a new thyroid tissue specimen, which consisted of hundreds of patches, was 20 ± 8 seconds (avg. ± st. dev.) for all imaging modalities. The heatmaps were produced by averaging the results of overlapping pixel regions in image-patches, since the 25×25 patches were produced with a stride of 13 pixels, which was used to produce a smoother and less coarse final result.

Multilayer perceptron
For salivary gland tumor detection of 16 tissue specimens from 6 patients, a simplified artificial neural network, called a multilayer perceptron (MLP), was used for intra-patient detection with spectral information only. The MLP consisted of a 91 unit spectral vector input, a single hidden layer with 128 neurons, and an output layer of 2 nodes (normal or tumor). This simplified MLP was applied only to the salivary gland tumor cohort and selected to limit overfitting in this small dataset. The salivary gland cohort was separated into parotid gland tumors (N = 3 patients) and other salivary gland tumors (N = 3 patients).
The spectral signatures of tissue were extracted by local averaging of 5×5 pixel blocks to reduce noise. Image pre-processing was used to remove the specular glare pixels from both training and testing. For each group, the normal (N) and tumor-only (T) specimens' spectra were used for training (85%) and a subset for validation (15%), and the tumor-normal (TN) margin tissue spectra was used for testing. For the parotid group and other salivary gland group separately, all patients' training samples were combined into one training group (6 tissues), and the three TN test specimens were classified. Training was performed on the order of a few minutes, and testing was produced in about one second.

Performance evaluation
The principal evaluation metric used for this study was area under the curve (AUC) of the receiver operator curve (ROC). The AUC score was selected because it is robust to class imbalances within tissues and provides an estimate of performance at all possible thresholds of separating the normal and tumor classes. Additionally, the accuracy, sensitivity, and specificity were calculated and reported using the tumor probability threshold from the validation data. All results were calculated on a tissue specimen level and averaged. Additionally, for the final testing results of both cohorts, the 10 pixels at the edge of tissue, corresponding to 0.25 mm, were excluded from performance calculations. The imaging protocol for ex-vivo tissue specimens was performed using a flat imaging surface, so the tissue edges created unnatural curvature where the tissue was too thin to provide an adequate imaging signal. Statistical significance was calculated for the test results using Student's t-test and a 0.05 p-value threshold.

Thyroid tumors
Tumor detection for the thyroid cohort with all cancer types combined (N = 76) demonstrated that the HSI-synthesized RGB multiplex images generated from the HSI were the best performing results in terms of average AUC scores with 0.89 and 0.90 for HSI-synthesized Gaussian RGB multiplex and the HSI-synthesized human-eye RGB multiplex, respectively. HSI performed with an average AUC score of 0.86. Full results of thyroid tumor detection by AUC score, accuracy, sensitivity, and specificity for all imaging methods are shown in Table 3 and separated by cancer type.
The average and median AUC scores are presented in Fig. 6(A) and (B) across all thyroid tumor types with statistical significance. Combining all thyroid tumors, both implementations of three-band, HSI-synthesized RGB multiplex imaging (average AUC score of 0.89 for Gaussian-RGB multiplex and 0.90 for human-eye RGB multiplex) outperformed autofluorescence (0.85 AUC score), 2-NBDG (AUC score of 0.86), and proflavin (0.83 AUC score) to a degree of statistical significance (all p < 0.05). Additionally, HSI-synthesized human-eye RGB multiplexing also significantly outperformed HSI (p < 0.05). For the PTC sub-group (N = 54), both HSIsynthesized RGB multiplex images statistically outperformed autofluorescence, 2-NBDG, and proflavin (all p < 0.05). For the MTC group (N = 6), both HSI-synthesized RGB multiplex images significantly outperformed 2-NBDG and proflavin (all p < 0.05). For the follicular tumor group, autofluorescence outperforms the other methods in AUC score, but the difference is not significant (p > 0.05). Lastly, poorly differentiated thyroid carcinomas were classified with the highest AUC score from HSI-synthesized human-eye RGB multiplex imaging, but not significantly (p > 0.05).
The different imaging modalities and respective probability heat-maps for tumor detection are shown in Fig. 7 for all groups of thyroid tumors. As can be seen, HSI shows the most consistent heat-maps around regions of specular glare, compared to the HSI-synthesized RGB multiplex methods. As shown in Fig. 6(B), the median AUC scores show that HSI (0.95) and the two HSI-synthesized RGB multiplex methods (0.95 and 0.96) have approximately equivalent median performance for combined thyroid tumors.
The median AUC scores are substantially greater than the averages, which indicates that the distribution of performance results tends to be more accurate than the average reflects. Histogram analysis of percent difference of the HSI and HSI-synthesized RGB multiplex imaging methods shows that HSI-synthesized RGB outperforms HSI in a relatively small number of tissue specimens, which causes the average AUC scores for HSI-synthesized RGB multiplexing to be greater than HSI. Figure 8(A) and (C) show the histograms of percent difference in tissues from HSI to both Gaussian-RGB multiplex and the human-eye RGB multiplex, respectively. Additionally, Fig. 8(B) and (D) show the tissue specimens with the largest differences in AUC scores compared to HSI, where HSI-synthesized RGB multiplexing still works quite well.
The three-band HSI-synthesized RGB multiplex images from HSI are meant to represent RGB imaging. However, these multiplex images are still constructed from HSI data. Standard RGB camera sensors have different responses to the component of the red channel between 400 and Table 3. Performance results of the optical imaging modalities for the thyroid tumor cohort (average ± SEM). The best performing modality for each groups' evaluation metrics is bolded.   Representative tissue images and corresponding classification heat-maps from all modalities from patients with thyroid carcinoma. Columns from left to right: histology, HSI with heat-map, HSI-synthesized Gaussian-RGB multiplex with heat-map, HSI-synthesized human-eye RGB multiplex with heat-map, autofluorescence with heat-map, 2-NBDG dye image with heat-map, Proflavin dye image with heat-map. Rows from top to bottom: papillary thyroid carcinoma (PTC), medullary thyroid carcinoma (MTC), follicular thyroid carcinoma, and poorly differentiated thyroid carcinoma. The contours in white (in heat-maps) and green (on histology) outline the cancerous regions. Predicted tumor heat-maps range from dark blue (predicted normal) to dark red (predicted cancer).

Fig. 8.
Differences in AUC score performance comparing HSI against HSI-synthesized Gaussian-RGB multiplex and HSI-synthesized human-eye RGB multiplexing. (a) Histogram of percent difference in AUC scores of tissue specimens between HSI and HSI-synthesized Gaussian-RGB multiplex imaging. The arrows show the bins that contain the patient specimens shown in the right panels, which are the two worst performing tissues. (b) RGB image of tissue specimen with large difference in AUC score performance between heat-maps produced from HSI, HSI-synthesized Gaussian multiplex, and HSI-synthesized human-eye multiplex image. (c) Histogram of percent difference in AUC scores of tissue specimens between HSI and HSI-synthesized human-eye RGB multiplex imaging. The arrows show the bins that contain the patient specimens shown in the right panels, which are the two worst performing tissues. (d) RGB image of the tissue specimen with the largest difference in AUC score performance between heat-maps produced from HSI, HSI-synthesized Gaussian multiplex, and HSI-synthesized human-eye multiplex image. The tumor margin is delineated in white. 500 nm. Therefore, for fold 1 of the HSI-synthesized human-eye RGB multiplex thyroid tumor detection, the red channel component from 400 to 500 nm was multiplied by half and by zero, and two more CNNs were trained. The results are plotted in Fig. 9. The original human-eye multiplex result for fold 1 using the original 400-500 nm red component was an AUC score of 0.90 for thyroid tumor detection. Completely eliminating the 400-500 nm red component (by multiplying by zero) in human-eye multiplex still resulted in an equivalent AUC score of 0.90 for thyroid tumor detection (p > 0.05). Lastly, an equivalent AUC score of 0.89 for thyroid tumor detection was obtained when the 400-500 nm red component was set to half the original value for human eye multiplexing (p > 0.05).
To further investigate how CNN methods can utilize HSI and the relevant wavelengths for correctly predicting normal and thyroid tissues, we incorporated the gradient class-activated maps (grad-CAM) algorithm [44]. Briefly, the method is used for tracing the most relevant gradients from the input data to the class of interest, either normal or tumor, which is used to infer spectral feature saliency. The mean spectral signatures and class-activated gradients were averaged for 89 tissues that were correctly classified with high AUC scores and separated into normal thyroid and tumor (Fig. 10). As can be observed in Fig. 10(a), the salient spectral features for correctly classifying normal thyroid tissues were from 570-700 nm. In Fig. 10(b), the most salient spectral features for correctly classifying thyroid tumors were also in the range of 550-700 nm, with additional bands near 500 and 750 contributing some lesser importance for classification. Fig. 9. AUC score results from one fold of the testing data comparing different methods using HSI. The HSI-synthesized RGB multiplex images represent RGB imaging with different parameters. From left to right in the plot: original HSI method, 3-band Gaussian-RGB from HSI, original HSI-synthesized human-eye RGB from HSI, human-eye RGB from HSI synthesized with half of the 400-500 nm red component, and last the human-eye RGB from HSI synthesized with none of the 400-500 nm red component. Values shown are average AUC score from all tissues in one fold of testing data with 95% confidence interval error bars. Fig. 10. Mean spectral signatures of correctly-classified normal thyroid tissues (a) and thyroid tumors (b). The saliency of spectral features is identified below each plot using the grad-CAM technique. Red hues represent the most important features for correctly predicting each tissue class, and blue color hues represent less important wavelengths for correctly predicting each class.

Salivary gland tumors
The intra-patient tumor detection results of the salivary gland tumors cohort are separated into parotid gland tumors and other salivary gland tumors. For parotid gland tumors, HSI was the best performing imaging modality with an AUC score of 0.92, accuracy of 88%, sensitivity of 90%, and specificity of 79% (all differences were not significant, p > 0.05). For tumors of other salivary glands, autofluorescence was the best performing imaging modality with an AUC score of 0.80, accuracy of 84%, sensitivity of 77%, and specificity of 85% (all differences were not significant, p > 0.05). The full results are shown in Table 4.

Discussion
The results of this extensive study suggest that label-free HS-based imaging and autofluorescence does indeed outperform the two fluorescent dye-based imaging methods tested here for thyroid tumor detection, but not to a significant degree. Interestingly, we discovered that HSI-synthesized RGB multiplex imaging significantly outperforms all imaging methods tested for thyroid tumor detection, including HSI and autofluorescence (p < 0.05). For salivary tumor detection, HSI performs best in the parotid gland and autofluorescence in performs best in other salivary glands, but no difference was significant. As can be observed in Fig. 6(B), one main conclusion from this work is that with sufficiently large datasets, many different optical imaging modalities can be used to create deep learning algorithms for tumor detection with median AUC scores of 0.90 and upwards. This phenomenon can be seen specifically for the thyroid tumors combined cohort. The experiments for the thyroid cohort and salivary gland cohort were processed separately in different ways because of vastly different numbers of tissue samples collected. The thyroid tumor cohort experiment was investigated in fully-independent testing patients because 200 tissue specimens from 76 patients were available. The salivary tumor cohort was comprised of only 16 tissues from 6 patients, so intra-patient experiments were performed using tumor-only and normal specimens for training and testing on tumor-normal margin tissues.
The hypothesis that HSI-based methods would outperform fluorescent dye-based methods was upheld in the thyroid tumor combined category, largely because it was supported in the PTC group (N = 54), which comprises 71% of cases. However, it was not supported for MTC, FTC, and poorly differentiated thyroid carcinoma groups (all p values were not significant). Additionally, the thyroid tumor detection results show that HSI-synthesized human-eye RGB multiplex imaging made from HSI statistically outperforms reflectance-based HSI. Despite differences in average AUC scores, the median values are equivalent around 0.95 (see Fig. 6). Exploring this phenomenon further in Fig. 8, it was demonstrated that only a few tissues differ between the HSI and HSI-synthesized RGB multiplex modalities. Moreover, the probability heat-maps from HSI seem to provide more consistent classification around regions of significant specular glare compared to the HSI-synthesized multiplex methods (Fig. 7). These results are consistent with a previous study from our group that was limited to only 11 thyroid patients, in which RGB composite images (AUC score of 0.95) also outperformed HSI (AUC score of 0.92) [24].
The purpose of the three-band HSI-synthesized multiplex images synthesized from HSI was to represent standard RGB imaging from a standard camera. However, these multiplex images are still constructed from HSI data. Additionally, there are differences in the spectral responses to the red channel component from 400-500 nm in standard RGB camera sensors. The impact of this red component value was studied, and no effect was observed in AUC score by altering these values for HSI-synthesized human-eye RGB multiplexing. To provide physical intuition for this conclusion, the grad-CAM method reveals that the most salient spectral features for correctly classifying normal thyroid tissues were from 570-700 nm, well above this range. Future studies are required to investigate if a standard RGB camera would indeed outperform HSI directly. Additionally, future work is needed to capture more thyroid tumor HSI data with higher spatial and spectral resolution HS cameras. It is possible that the spectral resolution of 5 nm in this LCTF spectral-scanning HS system was inadequate for this study.

Conclusion
In conclusion, we present an extensive study using 216 tissue samples from 82 patients to evaluate the performance of HSI for tumor detection of the thyroid and salivary glands. For comparison to HSI, the tissues were imaged with label-free autofluorescence and two fluorescent dyes, 2-NBDG and proflavin dye. Additionally, HSI-synthesized three-band multiplex images, representing the human-eye response and Gaussian RGBs, were synthesized from HSI. Several CNNs were developed for tumor detection that perform with median AUC scores of 0.90 and higher for all imaging modalities in combined thyroid tumors. Investigating each group specifically, our results suggest that HSI-synthesized human-eye RGB multiplexing can classify thyroid tumors significantly better than HSI. In salivary glands, label-free HSI and autofluorescence may offer the best performance for tumor detection. This study demonstrates that HSI could aid surgeons and pathologists in tumor detection in glands.