Development and validation of Raman spectroscopic classiﬁcation models to discriminate tongue squamous cell carcinoma from non-tumorous tissue Oral Oncology

Background: Currently, up to 85% of the oral resection specimens have inadequate resection margins, of which the majority is located in the deeper soft tissue layers. The prognosis of patients with oral cavity squamous cell carcinoma (OCSCC) of the tongue is negatively affected by these inadequate surgical resec- tions. Raman spectroscopy, an optical technique, can potentially be used for intra-operative evaluation of resection margins. Objective: To develop in vitro Raman spectroscopy-based tissue classiﬁcation models that discriminate OCSCC of the tongue from (subepithelial) non-tumorous tissue. Materials and methods: Tissue classiﬁcation models were developed using Principal Components Analysis (PCA) followed by (hierarchical) Linear Discriminant Analysis ((h)LDA). The models were based on a training set of 720 histopathologically annotated Raman spectra, obtained from 25 tongue samples (11 OCSCC and 14 normal) of 10 patients, and were validated by means of an independent validation set of 367 spectra, obtained from 19 tongue samples (6 OCSCC and 13 normal) of 11 patients. Results: A PCA-LDA tissue classiﬁcation model ‘tumor’ versus ‘non-tumorous tissue’ (i.e. surface squa- mous epithelium, connective tissue, muscle, adipose tissue, gland and nerve) showed an accuracy of 86% (sensitivity: 100%, speciﬁcity: 66%). A two-step PCA-hLDA tissue classiﬁcation model ‘tumor’ versus ‘non-tumorous tissue’ showed an accuracy of 91% (sensitivity: 100%, speciﬁcity: 78%). Conclusion: An accurate PCA-hLDA Raman spectroscopy-based tissue classiﬁcation model for discrimina- tion between OCSCC and (especially the subepithelial) non-tumorous tongue tissue was developed and validated. This model with high sensitivity and speciﬁcity may prove to be very helpful to detect tumor in the resection margins.


Introduction
Every year 300,000 new cases of oral cavity squamous cell carcinomas (OCSCCs) are diagnosed worldwide [1] and only half of these patients will survive 5 years [2]. The prognosis is negatively affected by inadequate surgical resections [3][4][5]. Nevertheless, it was recently shown that in current practice up to 85% of the oral resection specimens have inadequate resection margins (65 mm distance between tumor border and resection surface) [3,4].
During operation, the surgeon attempts to define the borders of the tumor by visual inspection and palpation. Additionally, a so-called frozen tissue procedure can be used for intra-operative histopathological examination of suspicious regions [6]. However, the diagnostic accuracy of the frozen section procedure depends on how well the sampled tissue represents the actual resection margin that is suspected to be inadequate [7]. Since in oral cavity up to 87% of all tumor-positive margins are located in the deeper soft tissue layers [8], the common practice of taking samples of the epithelial margin is of limited value. Moreover, the sampling error is often inevitable because only a small portion of the resection margins can be evaluated by frozen sections, due to the fact that the procedure is too time-consuming and laborious [9]. Thus, an intra-operative tool that provides a real-time and objective evaluation of all resection margins (especially in the deeper soft tissue margins) may increase the number of adequate resections and thereby improve the prognosis of patients with OCSCC.
Raman spectroscopy is an optical technique that is suited for such intra-operative use because it is nondestructive and no pretreatment or labeling of the tissue is needed [10]. The technique is based on inelastic scattering of light by molecules [11]. A Raman spectrum contains characteristic peaks that are assigned to a corresponding molecular structure within the illuminated tissue. Thus, Raman spectroscopy enables tissue characterization based on objective molecular information [12]. Because of these attractive properties, there has been much interest in the use of Raman spectroscopy for differentiation between tumor and non-tumorous tissue [13][14][15][16] in the head and neck region [17][18][19][20], including the oral cavity [21][22][23][24][25][26][27].
Tongue, which is the most common subsite of OCSCC [28], comprises different histological structures and layers [29], all having their own specific Raman spectroscopic features. In order to provide a tool that detects OCSCC within the surrounding nontumorous tongue tissue, a good understanding of all histological structures and their spectroscopic features is needed. Histopathologically annotated Raman spectra were used in our previous work [30] to enable distinction between OCSCC of the tongue and individual tissue structures, including those in the deeper soft tissue. Due to the large differences in lipid-protein ratio, a 100% accurate distinction was possible between OCSCC and adipose tissue, and OCSCC and nerve. Although the other deeper located healthy tissue structures (connective tissue (CT), gland and muscle) had a greater spectral similarity to the OCSCC spectrum, these structures were also spectrally distinguishable from OCSCC with high accuracy (93%, 94%, and 97% respectively). Furthermore, as might be expected by similarities in biochemical composition, the spectral features of OCSCC and surface squamous epithelium (i.e. nontumorous squamous epithelium covering the surface of the tongue) were partly overlapping, resulting in a lower discriminatory power of 75% [30].
The objective of our current study was to prove the potential of Raman spectroscopy in discriminating OCSCC from non-tumorous tongue tissue, by developing in vitro tissue classification models based on spectral data of individual non-tumorous tissue structures. The accuracy of the models was validated by means of an independent dataset.

Sample handling and sample preparation
At the department of Otorhinolaryngology and Head and Neck Surgery of the Erasmus MC Cancer Institute, University Medical Center Rotterdam, 44 tissue samples were collected from 21 patients who had undergone a surgical resection because of a primary OCSCC of the tongue. Informed consent was obtained prior to the operation according to the protocol approved by the Medical Ethics Committee (MEC-2011-450) of the Erasmus University Medical Center Rotterdam.
Seventeen samples (from 14 patients) contained OCSCC and non-tumorous tissue (i.e. surface squamous epithelium, CT, muscle, adipose tissue, gland and nerve). These samples were harvested from the fresh resection specimens, from a region with macroscopically visible tumor. The other 27 samples (from 19 patients) contained only non-tumorous tissue and were harvested from 2 locations: (1) 14 samples (from 12 patients) were taken from the resection specimen within a macroscopically normalappearing region adjacent to the tumor, and (2) 13 samples (from 13 patients) were taken from the contralateral (not-affected) edge of the tongue.
All samples were at least 5 Â 5 mm in size. Samples from the surgical resection specimens were taken within 60 min after surgical excision. Contralateral samples were taken during surgery. The samples were snap frozen by immersion in isopentane and subsequently in liquid nitrogen, and kept at À80°C until further use.

Raman spectroscopic mapping experiments and annotated reference spectra
The frozen tissue samples were mounted on a cryotome stage using CryoCompound (KP-CryoCompound, Klinipath B.V., The Netherlands), and 20 lm thick frozen tissue sections were cut, placed on fused silica windows and allowed to dry at room temperature. Raman mapping experiments were performed, using a Spec-traCell RA Bacterial Strain Analyzer (RiverD International B.V., Rotterdam, The Netherlands). This instrument was designed as a fully automated inverted confocal Raman microscope for analyzing bacterial samples. After modification of the software it was used for point-by-point Raman mapping of tissue sections.
The data processing and data analysis has been described in detail previously [31]. The data analysis software was developed in-house and operates in a MATLAB environment (MATLAB 7.5.0 (R2007b), MathWorks, MA, USA) with the multivariate toolbox PLS-toolbox 7.0.0c (EigenVector Research, WA, USA).
Briefly, about 100 mW of laser light (785 nm) was focused to a spot of 2 lm in diameter on the unfixed, unstained 20 lm thick frozen tissue sections. Selected regions were scanned point-bypoint in a 2-dimensional grid with a step size of 5 lm and a spectral resolution of 8 cm À1 . Of each point a single spectrum was obtained, constricted to the wavenumber range of 400-1800 cm À1 . The spectra were grouped using K-means Cluster Analysis (KCA) [32]. By assigning a color to each K-means cluster, pseudo-color Raman images were generated. After the Raman experiments tissue sections were stained with Haematoxylin and Eosin (H&E) [33]. Comparison of these pseudo-color Raman images with the H&E-stained tissue sections enabled histopathological annotation of the K-means cluster averages as {1} OCSCC, {2} surface squamous epithelium, {3} CT, {4} muscle, {5} adipose tissue, {6} gland or {7} nerve, as described in detail previously [31]. With OCSCC the epithelial (keratinocytic) component of the tumor is meant. Based on the observed histopathological heterogeneity, the annotated reference spectra of surface squamous epithelium were subdivided into 3 layers (basal layer, suprabasal layer and superficial layer) and 2 epithelial subtypes (dysplastic and nondysplastic), as described in detail previously [30]. These histopathologically annotated K-means cluster averages are hereafter referred to as annotated reference spectra. For further analysis, the spectra annotated as {1} were labeled as 'tumor' and the six individual non-tumorous tissues/tissue structures {2-7} were all marked as 'non-tumorous tissue'.

Development of tissue classification models
One half of the spectral database containing annotated reference spectra of 10 patients (hereafter called the training set) was used to develop the tissue classification models. To ensure an accurate representation of all tissue structures, the training set was created such that each tissue structure was represented in at least three patients. For the development of tissue classification models Principal Components Analysis (PCA) [34] was used, followed by (hierarchical) Linear Discriminant Analysis ((h)LDA) [35].
In this study, two tissue classification models were developed (Fig. 1).
(1) A PCA-LDA model 'tumor' (OCSCC) versus 'non-tumorous tissue' (i.e. surface squamous epithelium, CT, muscle, adipose tissue, gland and nerve). (2) A two-step PCA-hLDA model 'tumor' versus 'non-tumorous tissue'. In the first step the spectra of adipose tissue and nerve were distinguished from all the other spectra. In the second step the spectra of surface squamous epithelium, CT, muscle and gland were distinguished from the spectra of OCSCC.
The best model parameters were selected based on the classification accuracy (proportion of true results (both true positives and true negatives) of a leave one patient out (LOPO) analysis. In a LOPO analysis the classification models are built using the data of all patients but one, and tested on the data of the patient that was left out.

Validation of tissue classification models
The final tissue classification models were validated using the other half of the spectral database (hereafter called the valida-tion set) that contained annotated reference spectra of the samples of the 11 different patients not included in the training set.
The discriminative power of the tissue classification models was determined by a Receiver Operator Characteristic (ROC) analysis where the true positive rate (sensitivity) is plotted against the false positive rate (1-specificity) for different values of the discrimination threshold. The area under the ROC curve (AUC) is a measure of discriminatory power of the tissue classification model [36]. The maximum value for the AUC is 1.0, thereby indicating a (theoretically) perfect model (i.e., 100% sensitive and 100% specific). An AUC value of 0.5 indicates no discriminative value (i.e., 50% sensitive and 50% specific) and is represented by a straight, diagonal line extending from the lower left corner to the upper right. There are several scales for AUC value interpretation, but ROC curves with an AUC of >0.9 are generally interpreted as an excellent discriminative power, an AUC between 0.8 and 0.9 as good, between 0.7 and 0.8 as moderate and between 0.6 and 0.7 as poor [36]. The Youden index [37] is known as an optimal value of the discrimination threshold, yielding the highest combined sensitivity and specificity. However, the discrimination threshold can be chosen along the ROC curve, such that it provides the combination of sensitivity and specificity that is of greatest clinical value.

Tumor-heat maps
From the PCA-(h)LDA model predictions a posterior probability of being tumor was calculated for each individual point spectrum of each map. By coding this probability as a color between yellow and red, a heat map was generated for each measured Raman map.

Characteristics of the training set
The training set consisted of 127 Raman maps, obtained from 25 tissue samples from 10 patients. Of these 25 samples, 11 samples contained OCSCC as well as surrounding non-tumorous tissue structures (i.e. surface squamous epithelium, CT, muscle, adipose tissue, gland and nerve), and 14 samples contained nontumorous tissue structures only. From the 14 tumor free samples, 8 originated from the resection specimen from a region with macroscopically normal-appearing mucosa adjacent to the tumor, and 6 from the contralateral (not-affected) edge of the tongue.
The scanned areas ranged in size between 250 lm Â 250 lm and 1005 lm x 470 lm. With a Raman measurement step size of 5 lm this resulted in 2700-18,894 point spectra per Raman map (mean number of spectra per map: 11,421). The optimal number of K-means clusters per map varied between 4 and 20.

Characteristics of the validation set
The tissue classification models were validated using an independent dataset. This validation set consisted of 70 Raman maps, obtained from 19 samples from 11 patients that were not included in the training set. Of these 19 tissue samples, 6 samples contained OCSCC as well as surrounding non-tumorous tissue structures (i.e. surface squamous epithelium, CT, muscle, adipose tissue, gland and nerve), and 13 samples contained non-tumorous tissue structures only. From the 13 tumor free samples, 6 originated from the resection specimen from a region with macroscopically normalappearing mucosa adjacent to the tumor, and 7 from the contralateral (not-affected) edge of the tongue. The scanned areas ranged in size between 250 lm Â 100 lm and 1005 lm Â 420 lm. With a Raman measurement step size of 5 lm this resulted in 1050-16,884 point spectra per Raman map (mean number of spectra per map: 9620). The optimal number of K-means clusters per map varied between 3 and 19.
Since previous published results indicated that certain individual non-tumorous tissue structures were easier to distinguish from OCSCC than others [30], a PCA-hLDA model with two consecutive steps was built. In the first step the spectra of adipose tissue and nerve were distinguished from all the other spectra, using three PCs. In the second step the spectra of surface squamous epithelium, CT, muscle and gland were distinguished from the spectra of OCSCC, with an optimal number of 11 PCs. This model showed a maximum classification accuracy of 91%. The ROC analysis resulted in an AUC of 0.95 (Fig. 2). At a sensitivity of 100% this model yielded a specificity of 78%. The misclassifications with this discrimination threshold occurred for surface squamous epithelium (51/62), CT (12/185) and gland (5/9) ( Table 1).
Detailed histopathological evaluation of the misclassified surface squamous epithelium spectra revealed that 30 of the 32 spectra annotated as dysplastic epithelium were misclassified. Furthermore, all spectra (23/23) annotated as basal epithelial layers were misclassified. Analysis of the CT misclassifications showed that all misclassified CT spectra (12/12) were obtained from tissue samples containing OCSCC; i.e. from peritumoral stroma.

Proof of principle: tumor-heat maps
To demonstrate the differences between the tissue classification models, heat maps were made. A representative example is shown in Fig. 3. Fig. 3A-C shows a mapped area with OCSCC and surrounding non-tumorous CT. Fig. 3D-F shows non-tumorous tissue structures only. For more examples see Supplementary Figs. S1-S3.

Discussion
Raman spectroscopy is a nondestructive, optical technique that does not need pre-treatment or labeling to characterize tissue in real-time [10]. This makes the technique potentially very suitable for intra-operative application. In vivo recorded Raman spectra from intact (bulk) tissue will contain spectroscopic features from all histological structures and layers present in the entire illuminated volume [25]. To provide a solid foundation for the development of a diagnostic tool, spectroscopic knowledge about the entire target volume is mandatory. We therefore focus on the individual spectroscopic features of all histological structures present in tongue tissue. In previous work we investigated the potential of Raman spectroscopy for tongue cancer detection, by distinction between OCSCC spectra and spectra of individual non-tumorous tissue structures [26]. In this study, we used this knowledge to develop and validate 'tumor' versus 'non-tumorous tissue' classification models.
The use of Raman spectroscopy for characterization of normal and malignant tongue tissue was demonstrated by several author [24,26,38], but an independently validated tissue classification model that distinguishes OCSCC from non-tumorous tongue tissue has not been published before. Singh et al. described a tumor classification model for another oral cavity subsite: buccal mucosa [39]. They performed ex vivo measurements on intact (bulk) buccal mucosa samples. With their validated PCA-LDA tissue classification model, 87% overall accuracy was obtained when discriminating tumor tissue-spectra from normal tissue-spectra [39]. Although Raman results of different oral cavity subsites can show inherent spectral differences [40], the accuracy of this buccal model and our tongue PCA-LDA tissue classification model was comparable. The higher accuracy (of 91%) of our second model can be explained by the use of a multi-step hierarchical LDA. This proved to be more effective than a single step PCA-LDA model in the discrimination between 'tumor' and 'non-tumorous tissue'. The reason for this, is that in an PCA-hLDA model each discrimination step can be optimized separately [35].
Compared to tissue classification models that are based on intact (bulk) tissue measurements our method uses the spectra of individual tissue structures. This makes it possible to explore the misclassifications of the developed models in detail and gain insight in their clinical relevance. All spectra annotated as basal epithelial layers were misclassified as tumor. This is not surprising because OCSCC originates from the surface squamous epithelium. In carcinogenesis, stem cells located in the basal epithelial layers acquire genetic alterations, followed by clonal expansion [41]. This explains the similarity in biochemical composition (and thus Raman spectra) of surface squamous epithelium and OCSCC. However, it is important to underline that the misclassifications of surface squamous epithelium spectra do not compromise the clinical value of our tissue classification model. Up to 87% of tumorpositive resections margins are located in the deeper soft tissue layers [8], because tumor at the epithelium surface is visible and therefore often adequately resected. A Raman signature resembling squamous epithelium found in the deeper soft tissue is automatically suspicious of OCSCC.
In total 6.8% (17/251) of the non-tumorous tissue spectra that were obtained in the deeper soft tissue layers were misclassified. The majority (12/251) represented CT spectra that were obtained in the proximity (<5 mm) of the tumor (also referred to as 'peritumoral stroma'). This might be explained by micro-environmental stromal changes that occur around a tumor. Tumor development is accompanied by an immune response that leads to tumor infiltration by inflammatory cells [42] and neo-angiogenesis [43]. Hereby the biochemical composition of 'peritumoral stroma' differs from that of CT at a greater distance to the tumor. Though, these CT misclassifications are neither of clinical concern. In respect to the surgical aim which is to remove the tumor with an adequate margin of at least 5 mm of non-tumorous surrounding tissue, these false-positive classified spectra would in clinical practice not result in unnecessary resection of healthy tissue.
The developed PCA-hLDA classification model could directly be used for objective and automated assessment of frozen tissue sections. Unfortunately, current Raman mapping experiments take too much time to replace the routine frozen tissue section procedure.
In our study, the scanning step size of 5 lm was chosen to obtain molecular information on a (sub)cellular level. Investigation of the detection limit (minimum amount of tumor cells necessary to be detected as tumor) will define whether this step size can be increased to reduce the measurement time without the loss of information. Furthermore, there are several other approaches proposed to reduce the current measurement time. Takamori et al. showed that the total mapping time was reduced by the combination of Raman spectroscopy with auto-fluorescence [44]. Autofluorescence, which has a high sensitivity and speed but low specificity, was used to identify the areas in a tissue section that were suspicious for tumor and needed further detailed classification by Raman spectroscopy.
Use of intact fresh tissue (without making frozen sections) is another way to speed up the evaluation time, as described by Kong et al. [45]. They also used a combination of Raman spectroscopy and auto-fluorescence, and demonstrated that an objective diagnosis of basal cell carcinoma was provided for unsectioned tissue layers, faster than conventional histopathology and without the need for sample preparation. With this reduced measurement time intra-operative evaluation of all resection margins on the surgical specimen and/or in the wound bed is more achievable.
The classification model with an AUC of 0.95 from this study is the basis for further development of a Raman-spectroscopic diagnostic tool which can intra-operatively guide the surgeon to achieve adequate resection margins. Such tool can add to surgeon's experience (based on the visual inspection and palpation). The cutoff values of desired sensitivity and specificity may vary depending on the a-priori probability of suspicious tissue being tumor. Lower sensitivity and thus higher specificity may be accepted when a suspicious tissue to be resected had as low a-priori probability, combined with an expected functional loss or extended reconstruction.
In this study we developed and validated Raman spectroscopybased in vitro tissue classification models for discrimination between OCSCC and (subepithelial) non-tumorous tongue tissue. A detailed analysis was made of the misclassifications to gain insight in their clinical relevance. We conclude that the high sensitivity and specificity of the PCA-hLDA classification model would be helpful in achieving adequate resection margins and that such clinical implementation is technically feasible.