Explainable liver tumor delineation in surgical specimens using hyperspectral imaging and deep learning

: Surgical removal is the primary treatment for liver cancer, but frequent recurrence caused by residual malignant tissue remains an important challenge, as recurrence leads to high mortality. It is unreliable to distinguish tumors from normal tissues merely under visual inspection. Hyperspectral imaging (HSI) has been proved to be a promising technology for intra-operative use by capturing the spatial and spectral information of tissue in a fast, non-contact and label-free manner. In this work, we investigated the feasibility of HSI for liver tumor delineation on surgical specimens using a multi-task U-Net framework. Measurements are performed on 19 patients and a dataset of 36 specimens was collected with corresponding pathological results serving as the ground truth. The developed framework can achieve an overall sensitivity of 94.48% and a specificity of 87.22%, outperforming the baseline SVM method by a large margin. In particular, we propose to add explanations on the well-trained model from the spatial and spectral dimensions to show the contribution of


Introduction
Liver cancer is the sixth most commonly diagnosed cancer and the fourth leading cause of cancer death in the world, with approximately 841,000 new cases and 782,000 deaths each year [1]. Surgical removal is the primary treatment, but studies have shown that 30% to 50% of patients suffer from the recurrence, which greatly affects the long-term survival rate of patients [2]. Malignant tissues left during the surgery are one of the closely related factors affecting the recurrence of liver cancer. However, the golden standard assessment of specimen can only be obtained through the pathology reports after more than five to seven days, which is not able to provide timely feedback to the surgeon during the surgery. In this regard, intra-operative visual inspection is necessary. Hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (ICC) mainly cover over 90% of primary liver cancer. Currently, clinical intraoperative imaging Left: visualization of an example specimen under wavelength 500 nm-1000 nm. Middle: a hyperspectral image data cube consists of two spatial dimensions (x, y) and one spectral dimension (λ), which can be seen as spectrally resolved images or spatially resolved spectra. Two bounding boxes capture local pixels of primary liver tumor tissue (red) and normal liver tissue (blue), respectively. All the spectra in these selected local areas (shadow area) and their average spectral curves across the channel (solid lines) are displayed on the Right, illustrating the spectral differences between these two different tissues.
The biological explanation of spectra in hyperspectral images can be explained as [7]. Due to the existence of biological tissue structure and chromophore, light undergoes a series of scattering and absorption when transmitted in the tissue. Tissue absorption is a function of molecular composition. When the photon's energy matches the gap between the internal energy states, the molecule will absorb the photon. Since molecules have different energy levels, the photon energy they can absorb varies accordingly, which serves as a spectral fingerprint for molecular diagnosis. Besides, scattering is related to the structure and number of organelles in the cell that have a refractive index difference from the surrounding medium [8]. In the process of tumor development, as the composition of the tissue changes, its pathological characteristics, light absorption and scattering characteristics also change correspondingly. In this way, the spectral characteristics of tissue captured by HSI can be converted into quantitative diagnostic information.
Previous studies reported some promising results of HSI in tumor surgeries, e.g., glioblastoma tumor surgery [9][10][11], head and neck cancer surgery [12,13] , and breast-conserving surgery [14,15], which basically focus on fast and sensitive tumor detection and margin assessment. At present, little effort can be found on the application of the label-free HSI in liver tumor surgery.
In this paper, for the precise intraoperative delineation of primary liver tumors, we collected a clinical hyperspectral image dataset of fresh ex-vivo liver tissues. Using the dataset, a U-Net framework was developed to automatically segment tumors on surgical tissues in order to improve the performance of automatic delineation to approach the post-operative gold standard.
Owing to improvements in image analysis via deep learning, much progress has been made in automated medical diagnosis on hyperspectral images [15,16]. However, how these models generate predictions based on input data still remains a black box, particular in safety-critical applications, such as medical diagnosis, human-interpretable explanations are essential for the results [17]. Guided by visual appeal, existing studies [18][19][20] that consider interpretability mainly focus on extracting salient spatial regions of the input image. These regions are relevant for the prediction of a learned model. In medical HSI applications, model interpretability has not been extensively studied.
In addition to spatial saliency, the spectral interpretability of hyperspectral images is also a question worth exploring. Since the spectral information is related to tissue composition, it can provide us with some biomedical insights from the model's decision. In this paper, we employ a saliency map-based method to interpret the outputs of our tumor delineation model. Upon that, a very small subset of channels with representative diagnostic information is selected through our proposed method for more specific spectral indications. The benefits of selecting a small subset of spectra are two-fold: i) The more detailed channels may reveal the biological value of specific wavelengths more explicitly; ii) Reducing the spectral dimension can effectively improve the hardware/software codesign by reducing system filter integration, data collection time and model complexity. Thus, our method demonstrates the potential of a low-latency HSI system design for real-time intra-operative use in the future.
The contributions of this paper are summarized as follows.
• To the best of our knowledge, this is the first work evaluating the feasibility of HSI on primary liver tumor surgery. We collect a clinical hyperspectral image dataset of fresh ex-vivo liver tissues on 19 patients. With pathological annotation as ground truth, we develop a U-Net deep learning framework to automatically delineate the tumor margin on fresh specimens.
• We propose to add post-hoc explanations to this framework to show the interpretability of the model's decision. A saliency map-based method is applied to the well-trained model to extract salient pixels and spectral channels from the hyperspectral image data. From the spectral interpretability, we propose a novel saliency-weighted method to select a very small but representative subset of spectral channels for the specific task.
• Experimental results demonstrate the effectiveness of our framework. The tumor delineation network achieves an overall accuracy of 93.16% and a sensitivity of 94.48%. In addition, we show that only using 5 spectral channels is sufficient to reproduce the result of using 224 channels within a small sensitivity degradation of 2%, and even outperform the former in accuracy and specificity.
• Notably, our investigation reveals the correlation of salient spectral channels with differences in absorption between the normal and malignant liver tissues, indicating that the absorption characteristics of hemoglobin and bile at 420nm and 605 nm might be promising indicators to be further exploited.

Study design
The ex-vivo study includes patients who underwent primary liver cancer surgery at Peking Union Medical College (PUMC) Hospital (Beijing, China). For the experiment design, basically following [13], we collected specimens during the surgery and the corresponding pathological result could be obtained in five to seven days. To evaluate the capability of HSI to detect and delineate tumorous tissues, specimens containing both tumor and adjacent healthy liver tissues are included in this study. The overview workflow of dataset collection is shown in Fig. 2. Tissue slices are sampled without affecting the normal clinical process.

Fig. 2.
Overview of the dataset collection and result evaluation process. Surgical specimens are measured using an HSI setup (a). The data obtained from the HSI measurement is in the form of an image data cube (b), which is then processed by the automatic segmentation framework (c) to obtain a fast rendering result (d). The measured specimens are processed into H&E-stained sections (e) according to the standard pathological process, and then the tumor margin (f) is outlined by professional pathologists under the microscope. To obtain a ground truth annotated on the hyperspectral macro images, gray scale images of the RGB image (g) synthesized from the hyperspectral datacube and the pathological section image (h) were used for the registration. The registration follows a pipeline of global affine alignment (i) and a local deformation refinement (j). Accordingly, the ground truth on pathological sections can be mapped onto hyperspectral macro images (k), serving as supervision information for the auto-delineation network.
After resection, we carry out the hyperspectral image measurement of the fresh specimens in less than 15 minutes ( Fig. 2(a)). During the measurement, a light absorbing black-out material is placed under the tissue so as to avoid the interference of background stray light and followed by the standard pathological process. The measured specimens are then processed through fixation, slicing, hematoxylin and eosin (H&E) staining and then analyzed under microscopy by professional pathologists. The gold standard results on pathological sections outlined by pathologists are later mapped to the macroscopic spectral images, serving as the ground truth for algorithm evaluation (Fig. 2(k)). So far, all hyperspectral images of specimens are paired to their corresponding pathological ground truth, thereby constructing a dataset for automatic tumor delineation. Upon that, we developed a deep learning network for tumor segmentation (Fig. 2(c)) and added a post-hoc saliency explanation on hyperspectral images. The performance of tumor delineation was evaluated by comparing its results with the pathological gold standard, proving the effectiveness of the HSI combined with deep learning models.

System specifications
Hyperspectral images in the dataset were acquired using a pushbroom hyperspectral camera (SPECIM FX10, Spectral Imaging Ltd., Finland) that captures light in the visible and near-infrared (400-1,000 nm spectral coverage, 224 wavelength bands) with a CMOS sensor of 1024 × 1 pixels' resolution. To obtain a full view of the target specimen, samples were placed upon a translational platform and then scanned underneath the camera. The captured images have a size of 224 wavelengths × 1024 pixels × a varying number of scanned lines. Six halogen light sources (DECOSTAR 51 ALU, OSRAM) were used for illumination from both sides, under an angle of 45 degrees. Fig. 2(a) is a schematic diagram of the HSI setup.
During every measurement, a dark current and a white reference image are also captured for data calibration. This preprocessing step is to convert raw hyperspectral images into normalized diffuse reflectance so as to correct for spectral non-uniformity of the illumination and the influence caused by dark current [14]. The dark reference image is taken by closing the shutter of the camera. The white reference image is taken using a Spectralon target (SRT-50-050 Reflectance Target, ACAL Bfi Nordic AB, Uppsala) with known reflectance , providing the correction of the uneven illumination across the field of view. The linear behavior of the visual camera allows for a simple calibration of data by using where X refl denotes the calibrated spectrum, X is the original measured spectrum. R dark and R white represent the dark and white reference, respectively.

Ground-truth annotation
After HSI measurement, specimens are processed into pathological sections through fixation, slicing and staining, etc. During the section preparation phase, tissue undergoes deformations, thus the pathological micro images are different from hyperspectral macro images in both shape and morphology. To obtain a ground truth on the hyperspectral image, registration is required to map the tumor margin from the pathological section to the gross specimen. According to the basic pipeline in [21][22][23], registration is carried out in two steps, namely, global boundary alignment and local deformation refinement. Before the registration, we convert both images to grayscale and add Gaussian smoothing to reduce the impact of morphological differences. First, a global affine transformation is applied using mutual information as a metric to measure the similarity between multi-model images. After that, the boundaries of two images are aligned through rigid transformations ( Fig. 2(i)). An obtained global transformation matrix T1 is then applied to the pathological image. In step two, structural landmarks are selected in both pathological images and hyperspectral images as the control points for the subsequent detail fine-tuning. Then, a B-spline free-form deformation (FFD) non-rigid registration is applied to model the local deformation in the tissue [24]. The spline function-based registration method mainly uses control points and spline functions to describe the nonlinear geometric transformation domain, which can effectively fit the elastic deformation of the object. We apply the deformation registration to the images which have a size within 320 × 320 pixels with an image sampling rate of 1.34 mm.
A control point spacing of 32 mm or 20 mm yields improved correlation compared to the global transformation. The registration quality was measured by the sum of squared intensity differences. A strategy of Limited-memory BFGS (L-BFGS) [25] is applied to find the optimal deformed grid T2. Finally, the deformed grid T2 is applied to the transformed image from step 1, thereby the tumor margin is globally and locally registered ( Fig. 2(j)). Accordingly, the outlined tumor margin from the microscopic image is mapped onto the optical image of the gross specimen, i.e. a hyperspectral image data set of specimens with pathological ground truth is obtained. In addition, every specimen is labeled with an image-level diagnostic class of the absence or presence of the primary liver tumor. All registrations are implemented using MATLAB (R2020a).

Tumor delineation in surgical specimens
Model design We carry out the tumor margin auto-delineation in the specimens following a multi-task strategy of segmentation and classification. Since the U-Net architecture [26] achieves outstanding results in medical segmentation tasks with a small data volume, we developed the U-Net architecture on HSI image dataset following its encoder-decoder scheme. The encoding network performs down-sampling which extracts features from both spatial and spectral dimensions with multiple encoder blocks. The decoding network upsamples the feature map to input image size using multiple decoder blocks. Skip connections between the encoder and the decoder provide finer information to the decoding network, thus improving the segmentation result to a large extent. The multi-task network is implemented by adding a classification branch to the U-Net framework for the subsequent explanation of the decision-making process. Taking the hyperspectral images patches as the input, one branch outputs the pixel-level segmentation mask of the liver tumor while the other outputs an image-level classification result, indicating the absence or presence of the tumor. The network follows a fully-convolutional encoder-decoder architecture. In the encoding branch, all the convolution blocks are followed by batch normalization and ReLU activation. The detailed architecture is shown in Fig. 3. The model is trained with a multi-task loss L = L seg + L cls , where the categorical cross entropy loss L seg with the softmax function can be described as: Here, i is the sample number, k is the category number, and K = 3. The term 1{y i = k} is the indicator function of sample i belonging to class k, and p model (y i = k) is the model's predictionŷ. In the case of classification, K = 2 and the binary cross entropy loss L cls can be specified: Data preprocessing Hyperspectral images are divided into 64x64 small patches as the model input, and a leave-one-patient-out cross-validation is applied to evaluate the model's performance. The number of tumor pixels is fewer than that of other tissues, which causes a serious imbalance among classes. Resampling patches of different classes with an equal ratio can be applied to effectively solve the issue [27]. In order to improve the generalizability of the model, data is augmented from both spatial and spectral dimensions during the training [28]. Standard geometric transformations are applied on spatial images, such as random horizontal and vertical flipping, rotations, scales, and translations. In addition, we add radiation noise on every spectrum to improve the invariance of the model to spectral noise. In the experiment, a multiplicative noise generated from a uniform distribution ranging from 0.9 to 1.1 and an additive noise drew from a normal distribution are applied on every single spectrum as the spectral augmentation.

Saliency in hyperspectral images
The performance of the tumor delineation model can be evaluated by the gold standard, i.e., the annotation of pathological sections, indicating the reliability of the segmentation results. However, the annotation can inevitably have subtle uncertainties caused by manual delineation and auto-registration. For a reliable medical decision, we added post-hoc explanations to further evaluate the result. A rather simple but intuitive saliency map strategy is proposed to look into the model and further improve the overall design of the system (see Fig. 4). Saliency maps have been used as a popular visualization technique to detect how and why a deep learning neural network makes certain predictions [19]. We applied it to show the image-specific saliency of a given class, assessing the result of segmentation. Compared with previous work, we focused on providing saliency explanations from both spatial pixels and spectral channels. Through one single back-propagation, the derivative ω from a specific predicted category to the input image can be obtained from the well-trained model with the index h(i, j, c). Here, (i, j) indicates the spatial arrangement of elements in ω, while c indicates spectral channels. From the aspect of spatial saliency, the map M is computed as in the case of a multi-channel image, which is the same as a three-colored image. Inspired by the spatial saliency, the spectral saliency scores s can be accordingly specified as through summing all contributions of spatial pixels under one specific channel, which represents the contribution of different spectral channels to the classification result.

Spectral channel selection
Since adjacent spectral channels are highly correlated, it is desirable to eliminate channels that may not carry discriminative information in this task. Previous work reported that a reduction of spectral channels could actually reduce the complexity of high-dimensional data like hyperspectral images and then improve the results [29,30]. However, strategies such as unsupervised correlation analysis [31,32] in the preprocessing step lack task adaptation, and the method of supervised training requires labeled data [33] to select discriminative channels. Here, we propose a novel method to find a discriminative subset by introducing the channel saliency (see Section 2.5).
be a ground specturm set, where c i denotes the normalized spectral feature vector of a channel i. Here, n is the maximum number of spectral channels and m represents the length of the data sets. Specifically, we assign the saliency scores {s i } n i=1 , s i ∈ R as weight to the spectral feature vector as a task-specific prior for the channel selection process. Consequently, the saliency-weighted spectral features can be obtained as where σ is an optional tuning factor that controls the influence of the saliency score. Thus, the task-specific subset selection problem can be described as: given an m × n saliencyweighted data set C ′ with m ≥ n, the objective is to select the r significant channels out of the n channels or to select an m × r subset C 1 (r<n) of C ′ , which contains the most salient information in this task. Channel selection is carried out by deterministic and randomized strategies respectively, that is, QR factorization with column pivoting (QRcp) and determinantal point processes (DPP) sampling. Note that in this paper, we mainly emphasize the idea of including the saliency scores into the subset selection process and specify the modifications added in these methods. Mathematical notions and implementation details can be found in [34][35][36][37].
QRcp A column-pivoted QR factorization method [35,38] is adopted as a deterministic subset selection method with the rank r. It is applied to capture the most salient channels by choosing those discriminative features that show minimal similarities among them in an orthogonal sense. Here in this task, columns in C ′ denote the L 1 -norm saliency-weighted spectral features in each channel. The column pivoting option allows us to detect dependencies among the channels of the data set. The column order is determined in a greedy fashion. The first column of C ′ is selected according to the largest Euclidean norm weighted by its saliency, which means a more salient channel for this task with more independent characteristics with adjacent channels is more likely to be chosen as the first column of C 1 . Successively, the column having a maximum orthogonal component to the first selection is selected, and so on.
DPP sampling As an elegant probabilistic model, the DPP model [36,37] is adopted as a randomized method to find an r-element subset. According to the method, the probability assigned by a DPP to a subset C 1 is proportional to the determinant of the covariance kernel matrix L, that is according to the geometric interpretation of determinant, i.e., the square of the volume spanned by the spectral feature vectors c i . Therefore, as the magnitude of the channel's feature vector increases, so do the probabilities of sets containing that channel. Accordingly, highly correlated spectral channels are less likely to co-occur in the subset, so the diversity can be encoded during the sampling process [31]. Specifically, here the kernel L is modified with the consideration of the L1-norm saliency Finally, the r-element subset is determined with the maximum probability.

Results
In this section, we first introduce the basic settings of the following experiment. Then, we present the tumor delineation result on the constructed dataset and compare its performance with the baseline method. A saliency map-based post-hoc explanation and channel selection method was employed on the well-trained model to explore the human-interpretable difference between tissues. The experiments are conducted under an inter-patient framework to evaluate the model's transferability on different individuals. We also discuss the result of intra-patient settings with the same network and emphasize the influence of tumor heterogeneity.

Experimental setup
Dataset description In this work, 19 patients and a total of 36 surgical specimens were included in this study, with a ratio of 12 HCC cases and 7 ICC cases. As described in Section 2.1, we constructed a hyperspectral image dataset for primary liver tumor tissues with pathological annotations serving as the ground truth. Note that a part of the specimens were taken from different locations of the same patient to evaluate the intra-patient performance. The characteristics of patient and specimens are shown in Table 1. With annotation on hyperspectral images, each pixel/spectrum was labeled into a class: primary liver tumor, normal liver tissue or background. All the spectra of different tissues and their average spectra included in this dataset are given in Fig. 5. As we can see, although the spectrum has a certain offset in amplitude, the shape of the spectrum from the same type of tissue is similar. In our dataset, the spectra of HCC and ICC show high similarity in spectral characteristics, thus labeled into the same class, i.e., primary liver tumor. They all show obviously different characteristics from normal tissues. Network settings On each specimen image, we randomly sampled 60 patches of tumor, normal tissue and background with an equal ratio. Each patch is assigned a label indicating the presence or absence of liver tumors. The experiment was carried out in a leave-one-patient-out cross-validation setting. In each cross-validation, a data set of around 6030 patches of size 64 × 64 × 224 from 18 patients were used as the training set and around 120 patches from 1 patient was left for the validation. The model is optimized using stochastic gradient descent with a base learning rate of 0.001, decayed by a factor of 2 after 30 epochs in a total of 100 epochs. All experiments were conducted on a NVIDIA DevBox equipped with four TITAN GPUs, and all deep learning models were implemented using PyTorch.
Baseline methods A support vector machine (SVM) based classification model was developed acting as the baseline method. Rather than splitting patches, the pixel-based classification method considers a hyperspectral image as a library of spectra to be distinguished from different signatures. The SVM model separates different tissues by determining an optimal hyperplane that maximizes the margin between different classes in higher-dimensional space. In this regard, SVM works on the spectrum of every single pixel, which means the texture information is neglected.
With the same patch setting in the U-Net framework, the multi-class classification was carried out using SVM through a one-against-all strategy [39].
Performance metrics The performance was measured by metrics of the accuracy, sensitivity, specificity and a Dice similarity coefficient (DSC) according to the definition of true negative (TN), false negative (FN), true positive (TP) and false positive (FP). The metrics are computed as

Tumor delineation
Inter-patient study The results of tumor delineation on surgical specimens (N = 36) show the effectiveness of using U-Net architecture compared with the baseline SVM method. Figure 6 presents the performance of the two methods in terms of accuracy, sensitivity, specificity, and DSC. The developed U-Net framework outperforms SVM on this task by a substantial margin, reaching an overall accuracy of 93.06 ± 4.60%, a sensitivity of 93.75 ± 6.24%, a specificity of 87.22 ± 11.23%, and an DSC of 92.91 ± 6.93% with lower standard deviation. Figure 7(d) and (e) give the visual delineation results of both methods to discriminate the malignant liver tumor from the normal liver tissue. It can be seen that SVM shows more noise in the results associated with the margin where normal and malignant tissues intersect. Moreover, SVM performs worse on ICC specimens (see Fig. 7 S1 and S6) than on HCC samples, which is manifested in the appearance of some salt and pepper noises. This greatly degrades its sensitivity performance. It is also worth mentioning that under the same experimental settings, U-Net is also superior to SVM in terms of time performance, saving training time by more than 3 times. Under the network settings in Section 3.1, it takes the U-Net architecture 4-5 hours to perform cross-validation training for 19 patients. When evaluating on the forward model, it takes 0.71s on average to segment and classify one sample image. Ablation study Ablation study was performed over a number of facets of the model in order to better understand their effects on the performance. We mainly ablated the influence caused by the size of the receptive field, spectral data augmentation, spectral normalization, number of patches sampled in every image, class reweight on patches and batch size, etc (see Table 2).
Especially, there are three factors worth mentioning: First, the size of the receptive field is not negligible. Three commonly used kernel sizes ranged from 1, 3 to 5 were tested (E0-E2). The results suggest that the increase of the receptive field obviously helps improve the model  performance, indicating the importance of combining spatial information. Second, different from previous research work [14], adding spectral normalization degraded the performance significantly in our experiments (E0 and E7). A possible explanation for this could be that the difference in spectra amplitude is quite helpful in distinguishing liver tumor tissues. The last facet is the spectral data augmentation (E8 and E0). Compared to noise-free data, the result proves that adding some small perturbations, e.g., radiation noise, improves the model's performance. Intra-patient study We also carried out the experiment under an intra-patient framework using the same deep learning architecture as the above inter-patient settings, but the model is trained and tested on specimens from the same patient, i.e., 90% data is split as train-validation set while the rest as the test set. The intra-patient experiment reaches an accuracy of 97.91 ± 1.94%, a sensitivity of 97.80 ± 1.95%, a specificity of 99.28 ± 0.91% and a DSC of 98.47 ± 1.17%, surpassing the inter-patient framework by a large margin.
The result indicates that spectra of tissues from the same patient have a much better correspondence than that from different patients. This would be helpful in the long-term follow-up study of patients. We could track the condition of patients before, during, and after tumor resection for quantitative analysis [12]. In addition, compared with the inter-patient study, the result confirms the fact that tumor heterogeneity exists across patients. Although the same type of tumor has common characteristics, various causes (such as alcoholic fatty liver disease, cirrhosis, and HBV infection) lead to variance in their appearance, which greatly increases the difficulty of clinical inter-patient research. Therefore, a large amount of data is needed in the future to build a database that can cover the difference between patients.

Interpretability of the hyperspectral image
Salient pixels As a visual result of the classification branch, the spatial saliency depicted in Fig. 7(f) indicates where the model tells tumorous tissues from non-tumorous tissues. When the tissue is judged to be cancer, the pixels that play a major role in this judgment have a greater response and appear to be brighter. Similarly, we used the DSC metric to measure the similarity between the segmentation result ( Fig. 7(e)) and the saliency map ( Fig. 7(f)) by comparing their binary masks of malignant tissues. The results reported a DSC of 95.80 ± 3.29%, which indicates that the highlighted area in this saliency map has a high degree of consistency with the tumor segmentation results, confirming the reliability of the model. In fact, the saliency map itself is also an effective tool for segmentation. In this work, we merely use it as circumstantial evidence of the segmentation results to help us better understand and confirm the model's decision. The spatial saliency map can be regarded as another method to obtain the tumor margin. It can also be further used to improve the segmentation results in the future. Salient wavelengths Fig. 8(a) shows the saliency scores of every spectral channel, which guides us to pay attention to the difference between normal tissue and malignant tissue from the perspective of spectral information to establish a connection with tissue optics and even clinical pathology.
The contributions of different spectral channels are depicted as the gray curve shown in Fig. 8(a). Here the spectral saliency scores are min-max normalized into the range of [0, 1]. As can be seen that not all channels contribute to this task. Regions near 420 nm, 516 nm, 607 nm, 702 nm, 767 nm, 856 nm and 975 nm reach the locally largest contributions, respectively. Through the comparison with the average spectra curve (from Fig. 5(d)), differences like slopes (such as 420 nm, 607 nm, and 856 nm) or peaks (e.g., 516nm) can be easily observed as some intuitive features. These evidence provide some shallow explanations for the model's decision, indicating the model might use these features to distinguish different curves.
To further specify the discriminative channels, we ranked them according to their saliency scores and made a simple redundancy test. Less relevant channels are eliminated proportionally. Experiment results depicted in Fig. 8(b) reveal that nearly 90% of the spectral information is redundant in distinguishing malignant tissues since the top 10% salient spectral channels retain a similar performance as the 100% does. After the proportion of channels drops below 10%, the performance begin to decrease accordingly. In addition, performance improvement can be observed during channel reduction, which proves the importance of channel denoising. Although we found large redundancy exists in the spectral dimension, channel selection through saliency ranking is not optimal as we discussed in Section 2.6 that adjacent channels are most likely selected from the same region. Fewer predominant channels can possibly be found. Before that, we make an assumption that fewer spectral channels can achieve higher performances and are supposed to have a greater possibility of approaching the key factors that can distinguish tissues.
Channel selection using saliency scores To further find the dominant channels that contain discriminative information, we apply the proposed saliency-weighted channel selection to narrow the range of the salient channels. the most salient channels are selected by the saliency-weighted QRcp and the DPP model, respectively. Results are compared with that of using all channels.
To extract the dominant channels, we compared the performance of the proposed saliencyweighted methods with other channel selection strategies: (1) random selection; (2) selection according to the ranking of saliency scores; (3) manually selection according to the saliency peaks; and (4) selection using the QRcp or DPP model of spectral features without saliency scores. Empirically, we found that 5 channels could achieve a comparable performance (with roughly less than 1% drop) as 224 channels can perform. Thus, we suggest that 5 channels are used in this tumor delineation task. Specifically, the five channels extracted by saliency-weighted QRcp are 420 nm, 440 nm, 605nm, 702nm and 1000 nm. Slightly differently, saliency-weighted DPP extracts 420 nm, 605 nm, 702 nm, 847 nm and 1000 nm. The performance of different selection strategies can be found in Table 3. The extracted channels are pointed out in Fig. 9(a).
The result in Fig. 9 shows the importance of considering the correlation between spectral channels. Not surprisingly, random selection achieved an unsatisfactory result since noise has no contribution to this decision-making process. If we select only by ranking the salient scores, correlated channels might be redundantly selected ( Fig. 9(A1)). Although we can manually decide channels that reach the local maximum saliency scores ( Fig. 9(A2)), this is not the optimal way. In addition to the time-consuming disadvantages, different cross-validation settings cause subtle fluctuations in the spectral saliency map, leading to large uncertainties in manual selection. In this experiment, we applied the saliency scores as a task-specific prior and strategies of QRcp and the DPP model to measure the correlation between spectra to avoid selecting highly-correlated channels in the task. However, if we only analyze the similarity of spectra As can be seen that channels selected by saliency-weighted methods (A5-A6) achieve a diverse characteristic. (b) Visual inspection of the delineation results and spatial saliency map before and after channel selection. In addition to the improvement on delineation results, the salient map becomes more explicit after channel reduction, indicating the importance of discarding the redundant information in the decision. (c) Absorption coefficients of the main absorbors in the liver (Hb, HbO 2 , bile, water and lipid) from 400 to 1000 nm in a logarithmic scale. The inset figure shows the normalized absorption coefficients of Hb, HbO 2 , and bile between 500 and 1000 nm. Figures are adapted from [40]. Large absorption peaks of Hb and bile situated at 420 nm and 605 nm are pointed out, respectively. without considering saliency, it is difficult for the model to select the optimal subset to achieve this task. As shown in Fig. 9(A3-A4), channels located at the valley of saliency scores were also included in the subset (e.g., 486 nm, 790 nm), which apparently does not apply to this task, because their contribution is not significant. Therefore, we performed the saliency-weighted channel selection to automatically find the dominant channels. Compared in Table 3, the two strategies achieved superior performances than others, reporting the same dominant channels to discriminate primary liver tumor at 420 nm, 605 nm, 702 nm and 993 nm with a different selection at 440 nm and 847 nm ( Fig. 9(A5-A6)). The DPP is a randomized method for selecting subsets, while QRcp is a deterministic method for selecting orthogonal features. In practical use, both strategies can well extract the key channels for specific tasks. In practice, the QR strategy is preferable due to its deterministic nature.
The results also suggest that extracting a small but discriminative subset can effectively reduce data noise and even achieve better performance than using all spectra. As can be seen that the highlighted irrelevant pixels, such as the background, in the spatial saliency map have been reduced after channel pruning (see Fig. 9(b)), indicating a higher reliability in the model decision-making process.
Possible indications From the perspective of tissue optics, we consider several main absorbers in the liver, i.e., oxygenated and deoxygenated hemoglobin (HbO 2 and Hb), water, lipid, and bile, some qualitative speculations can be obtained.
Normal liver tissue typically receives its blood supply from hepatic portal vein, while primary liver tumor tissue receives most of the blood supply from hepatic arterial. Thus, the hemoglobin absorption bands could be very different. Specifically, Hb has the highest absorption peak at 420 nm while HbO 2 at 410 nm [8]. The result of salient channels at 420 nm is closely related to the absorption peaks of Hb and HbO 2 . Another important indicator in the visible is bile. Healthy liver tissue allows the perfusion of bile, while liver tumors, such as ICC, have different structures, resulting in different bile perfusions. The result of 605 nm channel obtained through channel selection is in close agreement with the previous work [40] where they report bile has a large absorption peak centered exactly at 605 nm ( Fig. 9(c)). As the wavelength increases in the near-infrared region, water absorption prevails, and tends to be consistent with bile due to the fact that bile is mainly composed of water [40]. The spectral signature around 1000 nm provides possible evidence of differences in water content of bile or blood between the two tissues. In addition, scattering becomes more pronounced than absorption in the near-infrared band. Therefore, we suggest that the significant channels at 702nm, 847nm, and 1000nm need to be combined with the scattering characteristics for future analysis, such as changes in the nucleus and organelles during tissue development that cause changes in the light scattering characteristics.

Discussion
HSI is a promising technology for future use in surgical applications with non-contact, label-free and fast imaging characteristics. In this study, we evaluated the effectiveness of the application of HSI and deep learning framework on fresh liver specimens for tumor delineation. Since a human-intelligible result is more acceptable than a result from an undetectable black box, especially for practical use in the medical domain, we emphasize the importance of adding explanations to the model's decision. Specifically, a saliency map-based analysis is performed on medical hyperspectral images from both the spatial and spectral aspects. On the basis of spectral saliency, we proposed a novel method to further extract key channels with possible indications and also select filters for a multi-spectral camera that would be used practically in the future. To the best of our knowledge, this is the first work studying intra-operative liver specimens using HSI. We focus on primary liver tumors, but the methods proposed in this work can also be applied to other tumor types.
In terms of automatic tumor delineation, the developed U-Net framework outperforms the pixel-wise classification method by a large margin and the ablation study shows that larger receptive fields produce better performance. These results indicate the importance of using spatial features during the learning process. In addition, we added a classification head to the network architecture as the second output of the image-level classification result. In this work, we merely use it as the explanation of segmentation results to confirm the model's decision is trustworthy. In subsequent explorations, this multi-task framework could also be extended to perform pathological classification, e.g., to distinguish HCC from ICC and identify the meaningful wavelengths which are responsible for distinguishing subtle pathological differences. In practice, the margin of ICC is more difficult to identify and the curative resection rate of ICC is lower than that of HCC [41,42]. Therefore, more research can be carried out on the identification of ICC margin. Limited by the amount of data, we did not conduct further pathological analysis in this experiment. Currently, we are collecting more specimens for further pathological exploration. In the effort to explore the model's generalization capability across different patients, the results reveal an obvious inter-individual variance compared with the intra-patient performance, which reinforces further the fact that tumor heterogeneity is a non-negligible factor in the clinical study. In that sense, again a larger data amount is needed to cover more cases.
As regards to the result of spectral interpretability, we demonstrate that the benefits of exploring salient spectra are two-fold: First, significant dimension reduction explicitly shows the contributions of different spectral channels, thus revealing an understandable decision making process of the model. In the result, we have reached a conclusion that Hb and bile content may be indicative in distinguishing liver tumors from normal tissues. Similarly, we can also find meaningful spectral features from the metabolic differences between tumors and normal tissues. These two different ideas can take advantage of each other's insight to better understand the process of tumor development. For example, previous studies reported possible association of bile acids and cancer development [43]. The bile acid profile in patients with primary liver cancer is significantly different with that in healthy people. The alterations in bile acids may affect hepatic metabolic homeostasis and contribute to the pathogenesis and development of liver cancer. [44][45][46]. We believe it would be meaningful to further explore the spectral characteristics of bile acids. However, the wavebands used in this experiment are in 400-1000 nm, and the results that can be obtained are limited to chromophores that are prodominant in this waveband. Follow-up explorations can also consider establishing more possible connections from the characteristic bands of tumor metabolism, such as glucose and lipids, etc [47][48][49][50][51], at corresponding wavebands.
The second is that reducing the spectral dimension is beneficial to hardware design in that it can effectively help filter selection for a specific application scenario and also reduce data collection time and model complexity. The hardware design of a multi-spectral imaging system is a necessary step towards future intra-operative study. Back to the motivation of this study, our purpose is to provide surgeons with direct feedback of pathological indications during the operation prior to the time-consuming post-operative pathology report. This requires HSI settings to measure the tissues in real-time. As far as we know, all previous studies, including our preliminary efforts, are based on ex-vivo specimens or using line-scanning hyperspectral cameras as the first step towards clinical use in the operation. It is currently almost impossible to actually use them on the surgical bed. According to the principle of HSI system design [52], there is always a trade-off among data acquisition time, spatial resolution, and spectral resolution since a real-time acquisition of three-dimensional data by a two-dimensional image sensor is a challenge in itself. We always have to sacrifice some of the performance of the system to meet the requirements of specific application scenarios, that is, a task-specific design is required. In the case of a real-time medical application, line-scanning or spectrum-scanning HSI systems are not suitable because they need a long data acquisition time. We have to ensure fast imaging speed and a large field of view for careful inspection, so a sacrifice on spectral channels is inevitable.
Fortunately, studies proved that only a small subset of spectra is sufficient to reproduce or even improve the best performance of using all channels. This is easy to understand because hundreds of spectral channels not only bring abundant information during the training process but also bring up a lot of noise. Therefore, the saliency-weighted channel pruning method provides a novel way to select the band of the multi-spectral imaging system for future spectral imaging applications in real-time intraoperative tasks.

Conclusions
In conclusion, this work evaluated the feasibility of HSI for tumor delineation on fresh ex-vivo liver specimens using a deep learning framework. In this work, we developed a U-Net segmentation framework and added a saliency map-based post-hoc explanation on the hyperspectral images to explore the spatial and spectral interpretability of the model's decision. Indicatively, the selected spectral channels are closely related to the differences of average spectral curves and the absorption coefficients between the normal and malignant tissues. Those salient spectral channels can be further pruned to form a relatively small subset to perform a real-life task without degrading the performance. This work preliminarily confirmed the feasibility and effectiveness of HSI in delineating liver tumors. Future work will focus on improving HSI performance required for intra-operative use, paving the way for real-time assistance during surgery.