Classification of basal cell carcinoma in human skin using machine learning and quantitative features captured by polarization sensitive optical coherence tomography

: We report the ﬁrst fully automated detection of basal cell carcinoma (BCC), the most commonly occurring type of skin cancer, in human skin using polarization-sensitive optical coherence tomography (PS-OCT). Our proposed automated procedure entails building a machine-learning based classiﬁer by extracting image features from the two complementary image contrasts oﬀered by PS-OCT, intensity and phase retardation (PR), and selecting a subset of features that yields a classiﬁer with the highest accuracy. Our classiﬁer achieved 95.4% sensitivity and speciﬁcity, validated by leave-one-patient-out cross validation (LOPOCV), in detecting BCC in human skin samples collected from 42 patients. Moreover, we show the superiority of our classiﬁer over the best possible classiﬁer based on features extracted from intensity-only data, which demonstrates the signiﬁcance of PR data in detecting BCC.


Introduction
According to recent estimates, one in five Americans develop skin cancer in their lifetime [1]. Although not fatal, basal cell carcinoma (BCC) accounts for more than 80% of total incidences of skin cancer [2]. In the current workflow of the US medical system, visual assessment of suspicious skin lesions by a primary care physician (PCP) serves as the basis for referring a patient to a dermatologist for positive determination of BCC in biopsied skin. Among possible treatment options, Mohs micrographic surgery is the most effective and advanced, and involves consecutive removal with immediate evaluation of excised tissue via frozen section histopathology until clear margins are reached. Although proven successful, these diagnosis and treatment procedures suffer from several shortcomings. First, visual assessment of the suspicious lesion by PCP is a subjective and challenging task. Second, biopsy is an invasive procedure with a potentially disfiguring outcome. Moreover, histopathological evaluation of the biopsied skin is a time-consuming procedure. Given the availability of non-invasive BCC treatments such as photodynamic therapy [3] and pharmacological immunomodulation [3], invasive diagnostic techniques such as biopsy should be replaced by non-invasive alternatives. Lastly, Mohs surgery is a time-consuming and labor-intensive procedure, mainly due to the time needed for processing frozen-section histopathology [4]. Thus, there is a significant clinical need for a non-invasive, real-time, automated and reliable technique for BCC detection to assist the PCP to make accurate referrals and to replace the invasive biopsy and time-consuming histology process that contribute to lengthy Mohs operations.
Optical coherence tomography (OCT) is a non-invasive, label-free and non-contact imaging technique that captures volumetric images of subsurface structures of biological tissue [5][6][7]. The close-to-microscopy resolution (7 µm axially and 15 µm laterally), modest imaging depth (1-2 mm in tissue) and near real-time imaging that OCT offers enables measuring changes in skin layers including epidermis and dermis; hence, it is suitable for BCC detection. Previous publications have confirmed the efficiency of OCT to detect BCC in human skin through several in-vivo and ex-vivo clinical studies [8][9][10][11]. These publications have mostly suggested qualitative, rather than quantitative, features to identify BCC regions in OCT images; hence, the process is not automated. The work of Jorgensen et al. [12], is the closest attempt towards automating the BCC detection process; however, the use of visually extracted features from OCT images prevents full automation of their methodology. Moreover, their classifier achieved suboptimal performance (less than 80% specificity and 81% sensitivity), likely due to the low discriminative power of features that are extracted from normal OCT (intensity) images alone. Recently, Schuh et al. [13] attempted to detect BCC in OCT images through a quantitative procedure. However, their methodology requires manual selection of regions of interest and thus is not fully automated. Additionally, Gao et al. [14] recently investigated the discriminatory powers of several extracted quantitative features from OCT images in distinguishing BCC, melanoma and pigmented nevi. However, the accuracy of a final classifier was not reported.
The additional birefringence contrast of polarization-sensitive OCT (PS-OCT) [15,16] can improve the specificity and sensitivity of detection of BCC, as the birefringence properties of normal skin are altered by the onset of BCC tumors [17]. Strasswimmer et al. [17] were the first to report the appearance of aggressive BCC in PS-OCT images and noted smaller values of phase retardation (PR) as a marker for regions showing aggressive BCC characteristics; however, since their preliminary work only included a small number of patients (two), the discriminative power of the proposed marker remains unclear. We recently reported a fully automated support vector machine-based classifier that identifies BCC in ex-vivo mice skin samples from PS-OCT images with high sensitivity and specificity (92.5% and 94.4%, respectively) [18]. The results confirmed that a classifier for BCC based on combined intensity and birefringence features outperforms a classifier based on intensity alone.
In this manuscript, we report a fully automated procedure to detect BCC in ex-vivo human skin (n = 42) from PS-OCT images. Using image processing techniques, we extracted numerous features from both intensity and polarization images. We then used machine learning to optimize the selection of features and to classify images as cancerous (BCC) or healthy using histopathology as the gold standard. The combination of intensity and birefringence features yielded a high sensitivity and specificity of 95.4% and 95.4%, respectively. To the best of our knowledge, the current study is the first fully automated diagnostic for BCC in human skin that uses PS-OCT images, and is the first demonstration of a large collection of PR image features to capture various birefringence properties of skin. The excellent accuracy of the classifier suggests the high discriminatory power of our proposed features, and the superior performance of a classifier including birefringence features suggests the relevance of PS-OCT data to skin cancer detection and possible automated detection of other tissue abnormalities that alter the birefringence properties of the tissue such as bladder cancer, breast cancer, burn depth, etc.

Tissue sample collection
De-identified tissue samples were collected from the Stanford Dermatology Clinic in Redwood City, CA as part of the standard-of-care treatment for BCC (Mohs surgery). The study protocol was approved by Institutional Review Board (IRB) of Stanford University. The samples comprised unused pieces from the first stage of Mohs surgery operations confirmed (by prior biopsy) to contain BCC. For each patient, we collected one to four tissue samples. After excision, the tissue was frozen according to the standard-of-care frozen-section histology process. For some patients we also obtained surrounding tumor-free tissue ("dog-ears"), excised after the tumor is completely removed to facilitate in the reconstruction, which served as part of our healthy controls. Tissue samples were transported to the imaging facility on dry ice, defrosted and imaged by PS-OCT. All were imaged within six hours of excision and one hour of complete defrost.
We collected samples from 61 patients; however, samples from 19 patients were excluded based on the following exclusion criteria: 1) samples were heavily damaged due to unavoidable frozen-section histology incisions (n=7), 2) we were unable to confirm the histology match to a PS-OCT image (n=4), 3) PS-OCT images were heavily corrupted by image artifacts (e.g., photodetector saturation, n=7), or 4) the skin cancer was determined to be squamous cell carcinoma (SCC) and not BCC (n=1). Thus in this study, we included tissue samples from 42 patients.

PS-OCT imaging
The imaging system comprised a single-mode fiber (SMF), swept-source PS-OCT system based largely on a previous design [19]. The PS-OCT engine comprised a 20-kHz swept rate laser (Santec, HSL-2100) centered at 1325 nm. The system has an axial and lateral resolution of 8.9 µm and 12 µm, respectively, and a sensitivity of 99 dB. One or multiple (up to nine) volumes were acquired per sample to cover a majority of the sample. The volumetric data comprised 512 × 256 × 3648 pixels in the X, Y and Z directions, respectively, and had a physical dimension of either 2.1 × 2.1 or 4.2 × 4.2 mm 2 in the XY-plane, and an imaging depth of 1.75 mm in Z.

Histology
Following imaging, we preserved the samples in 10% formalin for at least 72 hours. Afterwards, samples were shipped to a histopathology center for standard histology processing. Using ink marks as guidelines, each tissue sample was cut into 10-µm-thick sections, mounted on slides and stained with hematoxylin-eosin (H&E). A dermatologist (SZA) with histopathology expertise evaluated each slide to identify the presence of BCC.
Histology and PS-OCT images were matched by observing tissue ink marks visible in both images. Cuts and variations in tissue thickness served as additional visual clues for the matching process. An image was labeled BCC if the corresponding histology image contained a BCC tumor and labeled healthy otherwise. Histology was not acquired for dog-ear pieces, which we labeled as healthy, as they had been confirmed to be tumor-free. Based on histology results, our dataset comprised micro-nodular, nodular, superficial, and infiltrative subtypes of BCC.

Dataset construction
All processing and analysis was implemented in MATLAB R2014b. Reflectance, R, and singlepass PR, η, were calculated at each depth position z using standard PS-OCT equations [15]. We averaged adjacent complex Bscan data in sets of two to restrain noise before calculating reflectance and PR. We implemented algorithms involving intensity thresholding to find and exclude Ascans that were corrupted by image artifacts (e.g., photodetector saturation, deep cuts from the frozen-section histology process). Table 1 summarizes the final dataset after post-processing and adjustments. Note that 10 patients contributed both healthy and BCC data, while for a different group of 16 patients we only obtained healthy data, and for another group of 16 patients we only obtained BCC data. Thus in total we included data from 42 patients. Additionally, depending on the physical sample size, the number of collected PS-OCT images varied significantly among patients. However, to avoid biasing our dataset towards patients with a larger number of collected PS-OCT images, we only retained ten PS-OCT images per type (healthy or cancerous), per patient. The smallest number of labeled images for any patient was ten, which set the upper bound for all patients. For patients yielding more than ten labeled images, ten images were picked randomly to keep in the dataset.  Figure 1 shows examples of BCC and healthy PS-OCT images, with the corresponding histology. A healthy PR image ( Fig. 1(b)) shows an increase in the PR values as a function of depth. Based on our observations, the rate of this increase can vary by patient and location on the body. Despite this variation, most PR images collected from healthy skin show roughly uniform PR patterns in the lateral direction. BCCs in human skin manifest diverse patterns in PR images. For example, a uniform pattern (similar to healthy skin) is observed in some subtypes of BCC, such as nodular BCC ( Fig. 1(e)), whereas infiltrative BCC exhibits a different unique pattern ( Fig. 1(h)).
2.5. Selecting the ROI All features were calculated on data in a region of interest (ROI), defined in one of two ways. In both cases, the top boundary was determined by extracting the surface using a greedy algorithm [20] (Fig. 2(a)); the bottom boundary was selected to correspond to either a fixed distance (200 pixels = 940 µm) from the top surface (fixed ROI, Fig. 2(b)) or the deepest pixel attaining an intensity above a fixed threshold (binary ROI, Fig. 2(c)). The value of 200 pixels in the fixed ROI was chosen heuristically to greatly exclude pixels with low SNR. All Ascans within the fixed ROI have the same number of pixels; thus, this type of ROI is more appropriate for extracting features based on Ascans. The binary ROI comprises a binary image constructed from the intensity image by excluding values below a calculated threshold [21]. The binary ROI has the advantage of greatly restraining the noise and unreliable pixels (i.e., with lower SNR). For a given Bscan, each Ascan within the binary ROI may contain a different number of pixels.

Extracting features
Using both intensity and PR data, we extracted numerous image features from PS-OCT images. Our general strategy was to first extract multiple features to capture many properties of the skin and to then search for the subset of features that provides the highest accuracy when used to build a classifier. The features that we extracted can be divided into two main categories: 1) Features extracted from analyzing Ascans within a given Bscan, and 2) Features extracted from analyzing the whole Bscan. Table 2 summarizes the number of extracted features per category and the type of ROI used.

Ascan-based features
All Ascan-based features were calculated from the fixed ROI images. For all features, we first filtered both intensity and PR images using two-dimensional median filters with kernel sizes of [5 40] pixels (lateral vs. axial) and [25 40] for intensity and PR data, respectively. These filters sizes were chosen heuristically to simultaneously smooth the images and reinforce the layered structure of skin. Note that the fixed ROI was calculated from the original image and then applied to the filtered image. Examples of spatially filtered intensity and PR Bscans are presented in Figs. 3(a) and 3(c), respectively. Given the size of our dataset (i.e., 520 Bscan pairs) calculating Ascan-based features for each individual Ascan was deemed too time-consuming. Instead, for a given spatially filtered Bscan, we calculated Ascan features from laterally down-sampled images, comprising every fifth Ascan (since the Bscans were spatially averaged over five pixels for intensity data, and intensity filter has the smaller kernel compare to the kernel for PR data). Figure 3 (Fig. 4(c)); the values of these statistics themselves are the final Ascan-based features, organized into a one-dimensional vector ( Fig. 4(d)). The same procedure was applied to PR images.

a. Long-range intermediate features
To measure the attenuation coefficient [22][23][24][25], we used linear regression to fit a line to each intensity Ascan. We chose the pixel associated with the first peak of the intensity data as the starting pixel for the linear regression to exclude the stratum corneum and only capture epidermis properties. The slope, intercept and fitting error for this line were extracted as intermediate features. Previous publications have demonstrated the use of the slope of the line fitted to the PR Ascan (over a selected ROI) to detect tissue abnormalities and diseases such as basal cell carcinoma [17] and ovarian cancer, and to assess burn depth [26]. However, we chose to fit a higher order polynomial to the PR Ascan based on our observation that a higher order polynomial would provide a better fit. To exclude the stratum corneum, we chose the pixel associated with the first minimum of the PR data as the starting pixel for the polynomial fit. Then we fitted a 5th-order polynomial to these Ascans and extracted the six coefficients of the polynomial and the fitting error as intermediate features. 1) skewness, 2) kurtosis, 3) median, 4) second moment, 5) third moment, 6) fourth moment, 7) standard deviation, 8) relative smoothness [21], 9) uniformity [21]" 10) entropy [21]" 11) height of the largest bin, and 12) the mean value of the bin boundaries associated with the largest bin. For the PR Ascan only, we also calculated the difference between the mean values of the first and second 50 pixels [18].

c. Peaks and valleys intermediate features Pande et al. observed loss of normal tissue layers
as a hallmark for oral malignancy in the hamster cheek pouch [27]. To capture the number of tissue layers (i.e., degree of layering), they proposed features called "peaks and valleys" and "crossings." For a given Ascan, these features provide a measure of the number of prominent peaks in the data. Here we extended their analysis in two ways: 1) by extracting similar features from PR data, and 2) by incorporating information about the axial positions of the peaks and valleys (with respect to the surface) as additional features. The procedure to localize peaks and valleys resembled that of Pande et al. [27], and involved constructing an axially flattened Ascan (black curves in Figs. 5(d) and 5(j)). Note that the axially flattened intensity Ascan was normalized. In Figs. 5(d) and 5(j), the peaks and valleys are marked by blue and green lines, respectively. We denoted the values of the i-th peak and j-th valley by P i and V j , respectively. We then located the i-th local maximum (l ma x,i , blue upward-pointing triangle) and j-th local minimum (l min, j , green downward-pointing triangle) as illustrated in Figs. 5(e) and 5(k) for intensity and PR Ascans, respectively, and denoted their axial positions by z ma x, i and z min, j . The following parameters were calculated as intermediate features for both intensity and PR Ascans: 1) P i [27], 2) P i − V i [27], 3) P i + V i [27], 4) number of located peaks (i.e., #P i ), 5) number of located valleys (i.e., #V i ), 6) standard deviation of z ma x, i , 7) standard deviation of z min, j , 8) value of the first local maximum (l ma x,1 ), 9) value of the second local maximum (l ma x,2 ), and 10) value of the first local minimum (i.e., l min,1 ). Additionally, for the PR Ascan only we extracted the value of the second local minimum (l min,2 ) as the last feature.

d. Segment intermediate features
We defined segments as the portions of the Ascan bounded between each pair of consecutive local minima-local maxima. Upon visual observation, we recognized that segment patterns such as slopes, lengths, or total number of segments for a given Ascan, correlated with the state of the tissue (i.e., healthy vs. BCC). This correlation was especially strong for PR segments. Thus we extracted the segment patterns as another set of intermediate features.
To some extent, the segment patterns capture local tissue properties, assuming a segment corresponds to a tissue layer or an axial change in tissue properties. Thus using linear regression, we fitted a line to each segment, as illustrated by the dashed red lines in Figs. 5(e) and 5(k). Denoting the slope of the fitted line to the i-th segment by S g,i , the number of pixels within the i-th segment by S l,i (i.e., length of the i-th segment), and the total number of segments by N, we extracted the following intermediate features: 1) number of segments with positive slopes (i.e., #S g,i for S g,i ≥ 0) , 2) number of segments with negative slopes (i.e., #S g, i for S g,i < 0), 3) mean of slope values for segments with positive slope values (i.e., mean of S g, i for S g, i ≥ 0) , 4) mean of slope values for segments with negative slope values (i.e., mean of S g, i for S g, i < 0) , 5) S g,1 , 6) S g,2 , 7) S g,3 , 8) S l,1 , and 9) S l,2 . Four additional intermediate features were extracted for PR Ascans only: 10) weighted average of slope values defined as 1 N i S g,i S l, i , 11) S g,4 , 12) S l,3 , and 13) S l,4 . Note that each Ascan could comprise a different number of segments. For consistency, we decided to only capture the slopes and lengths associated with the uppermost segments, as they represent the properties of the important skin layers for BCC detection such as epidermis. Additionally, our observation suggested that in general the PR Ascan comprised more segments than the intensity Ascan; therefore, we included more PR segment properties as intermediate features.

e. Crossing intermediate features
We followed the same procedure as Pande et al. [27] for defining crossing intermediate features, which also aim at capturing the degree of tissue layering. The flattened Ascans were demarcated by 15 levels [27], illustrated by horizontal dashed lines in Figs. 5(f) and 5(l). We then defined C i to be the number of times the flattened Ascan intersects level i. The following 19 parameters were defined as intermediate features: 1-15) C i for i = 1...15 , 16) mean of C i , 17) median of C i , 18) standard deviation of C i , and 19) mode of C i .

a. Basic statistics
The following basic statistics [21] were used to define the first set of Bscan-based features for both intensity and PR data: 1) range, 2) standard deviation, 3) mean, 4) median, and 5) mode. These features were calculated only from values within the binary ROI.
b. Histogram statistics This set of features was calculated using the histogram of the intensity and PR data within the binary ROI. The intensity and PR histograms were constructed by  binning the corresponding values into 64 and 16 bins, respectively. The twelve histogram features described in section 2.6.1(b) were extracted for both intensity and PR Bscans.
c. Texture features Texture features of an image quantify properties such as homogeneity, coarseness and contrast [29]. In particular, we chose to use the gray-level co-occurrence matrix (GLCM) method [28], a statistical method for analyzing texture properties of an image. It considers the occurrence of certain gray-level pixel pairs with a specific spatial relation in the image and then extracts texture features from the statistical properties of this matrix. We extracted texture features from both intensity and PR data by computing GLCM at 1-, 2-, 4-and 6-pixel distances and for the three directions of south, east and south east. For calculating each GLCM, the intensity data within the binary ROI were scaled into eight uniform levels, yielding a GLCM of size 8 × 8. Similarly two GLCMs for the PR image were constructed by scaling the data within the binary ROI into eight uniform levels with respect to two ranges: 1) the constant range of 0 to π 2 (the PR values are always bound to this range) and 2) the actual range of PR values in the binary ROI. The following properties became the texture features [28]: 1) contrast, 2) correlation, 3) energy, and 4) homogeneity. Thus in total we calculated 48 (4 distances, 3 directions and 4 properties) and 96 (4 distances, 3 directions, 2 ranges and 4 properties) texture features from intensity and PR data, respectively.

d. Morphological features
Proposed by Garcia-Allende et. al. [30] for OCT images, morphological analysis aims at statistical study of the intensity distributions of image regions. For a given Bscan, regions are constructed by segmenting the intensity values achieved by utilizing the k-means algorithm [31] to partition data points into k clusters through an iterative process. Each data point is ultimately associated with the cluster for which the data point has the smallest distance to the cluster's centroid. Figures 6(a) and 6(b) illustrate examples of intensity and PR images comprising four clusters. We extended the morphological analysis of Garcia-Allende et. al. by incorporating the positional information associated with pixels within each image region in both lateral and axial directions. The traditional and extended analysis were carried out on both intensity and PR data, because we noticed that the shape and spatial extent of the PR regions correlated with the histology-based classification of the Bscan (i.e., healthy or BCC) in some cases. For both image types, we calculated morphological features by segmenting data within the binary ROI into two through six regions. The following properties were calculated from each region: 1) mean (i.e., centroid), 2) normalized mean (i.e., centroid divided by the range of the values in the binary ROI), 3) absolute deviation (i.e., sum of absolute values of all point-to-centroid differences), 4) relative size (i.e., total number of data points in each region divided by the total number of data points in the ROI), 5) standard deviation, 6) skewness, 7) kurtosis, 8) mean of axial positions (i.e., z values), 9) median of axial positions, 10) standard deviation of axial positions, 11) mean of lateral positions (i.e., x values), 12) median of lateral positions, and 13) standard deviation of lateral positions. Thus in total we calculated 520 features (20 image regions, 13 features and 2 data types (intensity and phase retardation), based on morphological analysis.

e. Spatial frequency-based features
Features computed from the 2D-DFT of an image capture the periodicity and orientation of image textures. The periodic and high PR-value bands observed in some of our PR images inspired us to extract spatial frequency-based features. The absolute 2D-DFT for each image type was first calculated [28] and then divided into regions based on spatial frequency content. We chose four regions of concentric rectangles by dividing each spatial axis into four uniform bands such that the bands have similar spatial frequencies in both axes. Feature were extracted by summing all the values in each rectangular region and normalizing them by dividing by the sum of the total value across all four regions. The features calculated from the innermost and outermost rectangular regions correspond to texture properties of the image with low and high periodicity, respectively.

Final set of features
Ultimately we calculated 1314 features: 613 from intensity data and 701 from PR data. The final feature matrix was sized 520 × 1314 (images x features), and the label vector describing the final classification outcome (healthy vs. BCC) had a size of 520 × 1. The feature matrix was normalized in mean and variance.

Building the classifier
Based on the features captured from the images, we constructed a classifier that predicts the presence of BCC in a set of PS-OCT images. We chose to employ supervised learning techniques, which utilize input data (i.e., training dataset) carrying known desired outputs (i.e., class or label) to identify the general function that maps input data to those outputs and works well for new input data (i.e., testing dataset).
To improve the predictive power of the classifier, we followed procedures to select the best subset of features, the best machine learning algorithm, and to adjust the algorithm's parameters. Immediately, we excluded features whose values were the same for all Bscan data from all patients, since they carried no useful information for classification purpose, leaving 1144 features from 1314 original features (520 intensity and 624 PR features ). To select the best subset of features, we implemented the minimal-redundancy-maximal-relevance (mRMR) algorithm [32] to identify features having high relevance (i.e., correlation) to the label vector and low correlations (i.e., redundancies) with other features. The benefits of building a classifier from fewer features are two-fold: 1) avoiding overfitting of the classifier's parameters to the current data, and 2) reducing the computational time. For each test classifier, we chose a subset of m features, S m R M R, m , consisting of the top m high-rank features. We then tested several machine learning algorithms to determine which algorithm yielded the classifier with the highest accuracy. The algorithms tested included support vector machine (SVM) with linear and Gaussian kernels, k-nearest neighbor (KNN), and random forest. At this stage, each classifier was built using all features from S m R M R, m and was optimized in accuracy by adjusting its parameters. As the final step, using forward search algorithm [33] and the selected machine learning algorithm, we selected the final subset of features, S f wd , that yielded the highest possible accuracy for the selected classifier. The forward search algorithm begins with an empty set of features (i.e., S f wd = {φ}) and upon each iteration successively adds the single feature F i from the original feature set (i.e., S m R M R, m ) whose inclusion leads to the highest accuracy.
Instead of the standard leave-one-out-cross-validation (LOOCV) or K-fold cross-validation (K-fold CV) [34], we evaluated the accuracy of classifiers constructed at different steps using a leave-one-patient-out-cross-validation (LOPOCV) to avoid biasing our estimate of accuracy due to potential correlations among images from a single patient. At each iteration, the classifier is trained on feature vectors from all patients except one and tested on the remaining patient. We calculated four parameters to assess our classifier's performance: 1) accuracy, 2) sensitivity, 3) specificity, and 4) area under the receiver operating characteristic (AUC).

Results and discussion
We constructed three feature sets using the top 100, 200, and 300 high-rank features (sorted by mRMR): namely S m R M R,100 , S m R M R,200 , and S m R M R,300 , respectively. The numbers 100, 200, and 300 were chosen arbitrarily. For each set, we implemented various machine-learning algorithms (section 2.7) while adjusting their parameters to select the algorithm that achieved the highest classification accuracy. Ultimately, SVM with Gaussian kernel of σ = 4 (SVM_GS4) constructed from the S m R M R,100 feature set provided the highest accuracy. For this classifier, the achieved accuracy, sensitivity and specificity were 90.2%, 91.2%, and 89.2%, respectively.
Up to this point, the selected feature subset (i.e., S m R M R,100 ) had only been optimized with respect to the label vector and the original feature set, but not with respect to the selected machine-learning algorithm. Using the forward search algorithm we further optimized the feature subset by selecting only those features that were optimized for SVM_GS4 (S f wd ). This final future set should yield the most accurate classifier when used with SVM_GS4. We ran the forward search algorithm on S m R M R,300 with 94 and 206 intensity and PR features, respectively. Figure 7(a) depicts results of the forward search process as a function of the selected feature at each iteration. For all iterations, the algorithm used was SVM_GS4 and the validation method was LOPOCV. As the graphs show, the classifier's performance improves upon adding new features only until 36 features have been added (marked by the dashed black line), where the achieved accuracy, sensitivity and specificity are all 95.4%, and the AUC is 97.2%. Thus, a classifier built using those 36 features (S f wd : 4 intensity and 32 PR) and SVM_GS4 achieves the best performance. Table 3 summarizes the final 36 selected features by category. A total of 22 and 14 features were selected from Ascan-based and Bscan-based features, respectively. The 13 Ascan-based features in the short-range category and the 9 Bscan-based features in the morphological category have the highest representation. This can be attributed to either their relevance for BCC detection or to their high representation in the original feature set ( Table 2). Note that exclusion of certain features in the final feature set does not in general diminish their importance or value for BCC detection: the final feature set is highly optimized with respect to the chosen machine-learning algorithm, and a different algorithm may have required a different feature set to achieve optimal accuracy. For this reason, having a large number of initial features offers the freedom and convenience of improving the classifier's performance by selecting the subset of features that is optimized for a given algorithm.
The current results are for a classifier based on both intensity and PR features. To consider whether PR data are necessary to attain high classifier performance, we built another classifier using intensity-only features and followed the same optimization steps (i.e., mRMR, algorithm selection, and forward search) to identify the most accurate intensity-only classifier. We determined a classifier built using SVM_SG5 with a subset of 18 intensity features (S f wd , results are depicted in Fig. 7(b)) to yield the highest accuracy. This intensity-only classifier achieved 83.1%, 81.9%, 84.2%, and 90.8% accuracy, sensitivity, specificity, and AUC, respectively, which suggests intensity-only features yield a classifier with suboptimal performance and do not carry enough discriminatory power to detect the presence of BCC tumors. On the contrary, the strong performance of the classifier based on both intensity and PR data clearly demonstrate the power of PR data in detecting BCC tumors and thus the relevance of PS-OCT in detecting BCC in human skin. Figure 7(c) shows the receiver operating characteristics (ROC) achieved by classifiers based on the final intensity-only or PR and intensity features. The ROCs clearly support that a classifier built based on intensity-only features underperforms the other classifier.  Although we only have demonstrated the efficacy of our proposed workflow for automated detection of BCC in human skin using PS-OCT images, we hypothesize the proposed features and the process of building and optimizing a classifier can be generalized to detecting any tissue abnormality using any type of OCT. This generalizability is mainly due to the abundant diverse features proposed and the applicability of the proposed process for building and improving a machine-learning based classifier. One disadvantage of the proposed workflow is the high computation time. Extracting features and running the forward search algorithm were the two most time-consuming steps: 17 seconds for extracting features per pair of intensity and PR images (total of 2.5 hours for all 520 PS-OCT images), and 26 hours for completing the forward search (Note that computation times are based on using a laptop with 2.5 GHz Intel Core i7 CPU and 16 GB 1600 MHz DDR3 memory). In fact, since forward search is an iterative process, selecting even the first final feature from the original feature set of 300 features takes as long as training 300 different classifiers. That said, extracting a huge number of features and running the forward search algorithm needs to be executed only during construction of the classifier; once the classifier is built, testing new samples in the clinic is quick (0.001 seconds) as they need only be tested against a small number of features (36).
In this work we did not account for the potential effect of the freezing process on the birefringence and scattering properties of our samples. Future studies may be necessary for careful evaluation of the potential changes for continued ex vivo work; however, we believe that the proposed workflow for automated detection is general and powerful enough to capture the presence of BCC tumors in unfrozen and even in-vivo human skin samples using appropriate training data.

Conclusions
In conclusion, we have successfully demonstrated the first automated detection of BCC in human skin using PS-OCT. Our classifier provides an unprecedented performance of 95.4% accuracy, sensitivity and specificity in detecting BCC in ex-vivo human skin. The proposed classifier was constructed by extracting our proposed new features in addition to features previously proposed by others from 520 PS-OCT image pairs with complementary contrasts (intensity and PR) from 42 patients.
Although our current implementation for building the classifiers is time-consuming, more efficient software platforms such as C++ or R could be used to reduce the time. Future clinical translation will require evaluation on in-vivo human skin samples; to accomplish this, one needs a portable and hand-held PS-OCT able to image parts of the bodies that are hard to reach by conventional stationary probes such as face, neck, and shoulder. Additionally future studies are required to focus on discriminating BCC from benign skin abnormalities (e.g., nevi), for which accessibility to a large population of diverse human skin samples is required.
Nevertheless, our results support the strong relevance of PS-OCT for detecting BCC in human skin, as our proposed classifier outperformed even the best possible classifier based on intensity-only images. Additionally our proposed workflow for automated detection can be generalized to detecting any other tissue abnormalities using any type of OCT system.

Funding
This work was partially funded by a Stanford Bio-X Seed Grant (IIP6-87). Tahereh Marvdashti was partially financially supported by a Stanford Graduate Fellowship.