On the spectral signature of melanoma: a non-parametric classification framework for cancer detection in hyperspectral imaging of melanocytic lesions

: Early detection and diagnosis is a must in secondary prevention of melanoma and other cancerous lesions of the skin. In this work, we present an online, reservoir-based, non-parametric estimation and classiﬁcation model that allows for this functionality on pigmented lesions, such that detection thresholding can be tuned to maximize accuracy and/or minimize overall false negative rates. This system has been tested in a dataset consisting of 116 patients and a total of 124 hyperspectral images of nevi, raised nevi and melanomas, detecting up to 100% of the suspicious lesions at the expense of some false positives.

The field of Applied Photonics (a key enabling technology (KET) as defined by the High-Level Expert Group of KETs of the European Commision [7]), has proven itself capable of minimizing both problematics. In particular, hyperspectral imaging (HSI) has shown promise for the detection of malignant lesions, given the rich chromophore-related information provided by radiation backscattered from tissues. Macroscopic HSI has been thoroughly employed in tissue classification [8][9][10][11][12][13], lesion segmentation [14,15], and quantification of detectable chromophores [16,17], among others (for further discussion on the capabilities of HSI as a clinical technique, please refer to reviews [17], [18], [19] and [20]). In the field of dermatology, multispectral digital skin lesion analysis (MSDSLA) devices are capable of detecting inflammation [21], segmenting and differentiating between malignant and benign melanocytic and non-melanocytic lesions [22][23][24][25][26][27][28][29]. Chromophores present in melanocytic lesions have been evaluated in large spectroscopic studies as well [30]. Assisted diagnosis has already been tested for multispectral images in commercially available systems such as SiaScope or MelaFind, with improvements in sensitivity and specificity in the range of 5-10% [20,31] and biopsy sensitivity and accuracy of up to 20% (from 64% to 86%) when used complementarily with standard clinical dermatology practice [32].
Most diagnostic methods in HSI imaging use well-known, standard machine learning algorithms, such as Support Vector Machines (SVMs), deep neural networks (both convolutional and feedforward), a combination of dimensionality reduction by Principal Component Analysis (PCA) and a clustering/classification algorithm (i.e. k-means, k-nearest neighbors), plain correlation [19], or a combination of these methods [33]. Unfortunately, algorithms that perform dimensionality reduction -such as PCA or the Singular Value Decomposition (SVD)-struggle with large datasets due to the computational complexity of the SVD. Neural networks and SVMs, on the other hand, rely on obtaining a hidden function that translates the problem domain into the solution domain by means of optimization, achieving classification. These two issues -i.e., the quadratic complexity of dimensionality reduction algorithms with respect to dataset size and being unable to easily determine the proper significance and clinical exploitability of any automated diagnosiscould complicate the usability of these methods in secondary prevention, and therefore should be avoided. Also, the advent of new MSDSLA systems in recent years call for a simple, concise detection protocol that allows the use of any molecular imaging system to assist in a clinical setting [32].
In this work, we describe and test an algorithm capable of dealing with large lesion datasets, reducing dimensionality while taking into account all incoming samples, and obtaining a metric or function that has a transparent, simple relationship between any pathology and its signature features, while also allowing for the classification of any additional lesions quickly and accurately, following a specified clinical policy. This combination of well-known algorithms is based on the Neyman-Pearson Lemma -the basis of the classical radar detection problem-and, given its constitutive elements, can be described as a reservoir-based, online, multidimensional non-parametric probability density estimation and classification framework on large datasets.
The proposed approach on the classification problem begins with finding a vector basis that can explain the dataset as a whole and reduce its dimensionality so that, through random population subsampling, it is possible to obtain statistically significant populations for each particular pathology under study, large enough to adequately interpolate the probability density function (PDF) of each tissue category in the low-dimensional, newly found feature space. Said PDFs are obtained and used to specify the most likely features or spectral signatures of each particular pathology. The interpolation -or, rather, estimate-of the PDF is calculated with kernel density estimation (KDE), and a Likelihood Ratio Λ(x) and threshold parameter γ are established in such a way that detection is maximized. This PDF estimate is evaluated, then, on incoming new pixels, and the likelihood ratio is employed in their classification. Different applications have validated the feasibility of KDE for segmentation, in the fields of home range analysis [34], visual surveillance systems [35] as well as hyperspectral remote sensing [36] and ship detection via synthetic aperture radar [37].
The combined action of population subsampling and density estimation of each population allows for the classification of hyperspectral images with tunable sensitivity and specificity, as the problem requires. In our case, we wish to either minimize false negatives at the expense of a few false positives (to minimize misdetection) or maximize overall accuracy.

The melanoma HSI dataset
In order to test the proposed methodology, we require a multispectral dataset of pigmented lesions. The selected hyperspectral database was generated by Spigulis, Lihacova (formely Diebele), Kuzmina et al. during several clinical trials in a collaboration between the Biophotonics Laboratory of the Institute of Atomic Physics and Spectroscopy and several clinics in Riga, Latvia [22,23,38,39]. This dataset consists of 116 patients (52 nevi patents, 33 raised nevi patients, 31 melanoma patients). In some nevi and raised nevi patients, more than one lesion has been imaged. The total number of samples is 124 (59 nevi, 35 raised nevi, and 31 melanomas). Table 1 describes the number of patients and number of lesions imaged per patient. There are no patients with more than one type of lesion (i.e. patients with benign and malignant lesions simultaneously).
For every patient, a concise protocol was followed, for which about 30 minutes were required per lesion, of which up to 2 minutes were used for long-exposure acquisition [22]. Nevi were inspected by dermatologists, whereas melanomas also underwent histological examination to prove their malignancy, as usual in these types of clinical studies [31,40]. The images vary in spatial resolution, but are consistent in spectral resolution per pixel. The device utilized was a 51-channel, 16-bit Nuance EX hyperspectral camera, which operates in the Vis-NIR range (450-950 nm), with a step of 10 nm. Optical density, OD, was directly calculated by Nuance software as a function of reflectance R(λ) via the well-known equation with I(λ) the received light intensity at the sample, and I 0 (λ) a reference spectrum, in particular a sheet of paper applied over the lesion [41]. The white reference image defines the exposure time required to maximize dynamic range in the hyperspectral system. That same exposure time is then used on the lesion itself [22]. Given the fact that there were no regions of interest (ROIs) in the dataset itself, two rectangular regions of interest were generated manually for each specimen, such that -for each of the samplesone ROI was placed within the melanocytic lesion, and the other in a surrounding area, where there is visibly no lesion present. These ROIs (in-lesion and out-of-lesion) do not intend to provide clinical nor spatial information, as they just serve as locations for our method to extract spectra from within and outside each pigmented lesion. Special care was taken in order to keep the ROIs as clean and far away from fringe regions as possible. Also, in order to avoid any classification performance biases, we will evaluate the algorithm on a patient-by-patient basis, instead of analyzing samples separately, mimicking as realistically as possible what would happen in a real clinical setting.

Standard normal variate (SNV)
Interpatient -or intersubject-variability, i.e. the inconsistency in measurement characteristics amongst subjects that should a posteriori be classified as identical [42], is a prevalent phenomenon in fields such as hyperspectral imaging. Changes in position and illumination make non-contact diagnosis of material properties notably difficult in many cases [43]. In order to compensate for this variation, we chose to employ the standard normal variate (SNV), known for eliminating bias and trend in spectra [44]. In obtaining the SNV, each reflectance vector x ∈ R n will be converted into a normalized vector x , by means of subtracting its average value µ = 1 N N i=1 x i and dividing by its variance: x = x − µ σ . (2)

Sequential singular value decomposition (SVD)
Consider a corrected hyperspectral matrix as a tensor A ∈ R m×n×l with real elements {a mnl } = R(m∆x, n∆y, λ 0 + l∆λ) representing the diffuse surface reflectance of the material under test at discrete positions (m∆x, n∆y) and wavelengths λ 0 + l∆λ (with ∆x, ∆y, ∆λ the spatial and spectral resolutions established by the system, respectively), and A = A (3) ∈ R l×mn as its mode-l matrization, such that each column in A represents the spectrum of a single pixel. In other words, if we ignore pixel position and only pay attention to its spectral characteristics, the singular value decomposition allows the representation of A as a sum of matrices of rank one with r = rank(A) the rank of our matrix, u 1 , . . . , u r the left singular vectors of A (columns of orthogonal matrix U), σ 1 , . . . , σ r its singular values (nonzero elements in diagonal matrix Σ), and v 1 , . . . , v r its right singular vectors (columns of orthogonal matrix V). This means that the SVD allows for lossy compression of spectra (i.e. dimensionality reduction) by means of an L-rank approximation, accomplished by truncating the sum in Equation (4). The error of this approximation is known; it is given by the next singular value, such that ifÃ will hold true for any A [45]. Thus, if there is notable redundancy in the columns of A, it will be possible to represent each column a t as a weighted sum of the first L r columns of U, namely where ., . represents the inner product of any two column vectors. The coefficients α t,1 , . . . , α t, L are the coordinates of the t-th column vector of A in the space described by the first L vectors of U in Equation (3).
Finding the basis for a single image is not a problem from a practical perspective; the total number of operations required for executing the SVD on a matrix A ∈ R M×N is O(M N 2 ), while using O(M N) memory. These algorithms, though, become problematic as the number of column vectors N increases, which is something that is expected to happen when dealing with large datasets. That, and the fact that we are looking for an L-rank approximation with a very small L, calls for an algorithm that efficiently calculates the first few singular vectors of a matrix composed by concatenated matrices A = (A 1 | A 2 | . . . ) by analyzing each matrix separately.
The sequential Karhunen-Loeve (SKL) algorithm, which is based on the R-SVD algorithm, can be of good use for this purpose. Given a matrix A ∈ R M×N , the SKL algorithm obtains -by taking the columns of A in batches of P columns-the first K columns of U in Equation (3) with negligible errors, in O(M NK) operations and using O(MK) space [46]. Thus, using an implementation of the SKL algorithm, and by specifying a value of K sufficiently large such that L fulfills K > L, but also K, P N 1 + N 2 + . . . , it is possible to find the first K columns of U that can represent a complete dataset in a feasible manner.
Once we have a vector basis of size K, we may choose L by looking at the relative contribution (singular-value-wise) of each additional dimension to the trace of Σ (and, thus, the quality of the approximation). Instead of selecting a fixed value of L, once the SKL algorithm has provided us with the K-rank approximationÃ K = U K Σ K V T K , we may look for the value where becomes negligible, e.g. where σ contrib ≤ 10 −3 , which is a common dynamic threshold for dimensionality reduction [10,14].
A projection of each spectrum a t in the newly found feature space given by u 1 , . . . , u L as in Equation (5) provides us with x t = (α t,1 , . . . , α t, L ) T which will correspond to the coordinates of spectrum a t in a feature vector space of reduced dimensionality that explains with high fidelity most of the properties of all tissue samples in the dataset. This vector basis will be obtained from the ROIs specified in the previous section.

Kernel density estimation
After each spectrum is located in a vector space common to all subjects, the next step is to assume that each pixel x t is a sample from an n-dimensional random variable X, which is assumed to have a different response to each given diagnostic hypothesis X |H 0 , . . . , X |H m , where m is the number of hypotheses. Specifically, this work evaluates 3 different hypotheses: H 0 for nevus; H 1 for melanoma and H 2 for healthy skin.
Ideally, we seek to obtain the probability density function of this variable given each hypothesis, namely f (x|H 0 ), . . . , f (x|H m ). In other words, f (x|H 1 ) represents the value of the likelihood of x ∈ R L belonging to a specific hypothesis or tissue type, in this case H 1 , melanoma. If we are given a sample set of n pixels given the k-th hypothesis, X |H k , i.e. X k,1 , . . . , X k,n , the probability density function (PDF) f (x|H k ) at position x can be estimated using a multivariate kernel density estimator [47]:f and here K(.) is referred to as the kernel function. Since the reference vectors are more determinant than the kernel function in terms of accuracy, we use the standard multivariate normal kernel and here d is the dimension of the vector space and H is the bandwidth matrix. For our system, Silverman's rule-of-thumb estimator was used [47], which gives the optimal window with for the smoothing of normally distributed data with variance σ i on each dimension, and so H is a diagonal matrix whose nonzero elements correspond to and, since our dimension is given by the SKL algorithm, d = L.

C++ KDE implementation
Since kernel density estimation requires a total of n operations per reservoir (given by Equation (7)), KDE was implemented in C++ with hardware acceleration libraries, in order to reduce overall execution runtime. For this implementation, both the GSL (GNU Scientific Library) and the OpenCL (Open Computing Language) GPU (Graphics Processing Unit) APIs (Application Programming Interfaces) were used. Verification of estimation accuracy was performed by comparing the numerical differences between the results given by the hardware-accelerated implementation and other high-level language implementations (Python 3 and MATLAB) on random RGB images, using a computer with an Intel i7 6700 processor (8-core, 3.4 GHz GPU) and an nVidia GeForce GTX 1060 (3GB RAM) graphics card.
As a result of hardware acceleration, it was possible to obtain an average speedup factor of 45×, and a negligible average relative numerical error in the order of 10 −7 percent. A secondary test was carried out where CPU (Central Processing Unit) multithreading was used, obtaining a 5× speedup on average. The GPU-accelerated method was selected as preferable and wrapped accordingly for its use with the other Python3-implemented functions, which allowed us to generate PDF estimates of whole images in the order of milliseconds.

Reservoir sampling and Algorithm R
For a set of q pixels under analysis, and n reference pixels for each hypothesis, the number of operations specified in Equation (7) are in the order of qn products of vectors of dimension d. As the resolution of the image and/or the number of reference pixels increases, the operation becomes unfeasibly large, and so we must reduce either quantity, even when recurring to parallelization.
Random sampling -the selection without replacement of a random sample of size n from a pool of N n entries-is useful in this context. A reservoir is just the result of performing random sampling: given a complete set of samples S, |S| = N, a reservoir is a subset R ⊂ S with size |R| = n. The main property of an adequate reservoir is that the probability of any vector in the pool to belonging to the reservoir is a constant, namely P(s ∈ R) = n/N, ∀s ∈ S.
There are several algorithms that allow this with more or less optimality, but for our case the simplest (Algorithm R) was employed, due to current processor speeds and random number generation not being the bottleneck of the system. In Algorithm R, the reservoir R is initialized with the first n entries of S (i.e. R = {s 1 , . . . , s n }) and then the next n + 1, . . . , N entries are read consecutively. For each entry s t , t ∈ [n + 1, N] in S, we obtain a single sample of i, an integer, uniformly distributed random variable in the interval [1, t]. The i-th entry in R is substituted by s t if i < n. By following this procedure, all the elements in S are read only once, enabling us to continue with an online approach [48].

Naïve Bayesian and weighted one-versus-all (OVA) classification
Given a newly acquired spectrum x t ∈ R L for which probability density function estimates for each tissue type f (x t |H 1 ), f (x t |H 2 ), . . . , f (x t |H m ) have been calculated, we may proceed by classifying such position by a maximum a posteriori (MAP) classifier: Thus, a pixel x t is most likely to belong to class H k if the estimated value of the likelihood functionf (x t |H k ) at such position (times the a priori likelihood of the k-th hypothesis to be true) takes the largest value. If we assume that all hypotheses are equally probable, then Equation (10) becomes which is the equation corresponding to the Maximum Likelihood (ML) classifier [49]. We can therefore use the one-versus-all (OVA) criterion for pixel classification. A pixel x t will be assiged a class b t ∈ {H 0 , . . . , H m } by following the OVA equation which can be weighted in order to change the probability of classification for any particular class, by including real-valued weights γ 0 , . . . , γ N ∈ R to the classifier: For the problem at hand, γ i = 1 except for the melanoma class, which will be the varying parameter that sets, in practice, true positive and false positive rates.

Online interpatient-invariant property learning
The implementation of the reservoir-based classifier was achieved by the interconnection of the five different functions described so far, as shown in Fig. 1. The system has two modes of operation: sample learning and sample evaluation. We will define m as the number of classes -hypotheses-pertaining to the problem (in our case, categories for nevi, melanoma and healthy skin.) During sample learning -depicted by the blue arrows in Fig. 1-, spectra-label pairs (a t , l t ) are acquired in batches of size P. The SKL algorithm is then applied on the P-sized batch of spectra, ignoring their labels. Low-dimensional feature space basis vectors u 1 , . . . , u K are updated after every batch. After this calculation, each pixel in the batch is sent, without reducing its dimensionality, into its corresponding size-n reservoir R 0 , . . . , R m (depending on its label), where algorithm R selects a subset of them randomly, generating a statistically significant population of each class, {r 0,i }, . . . , {r m,i }. Pixels that have not been randomly selected are discarded in each reservoir.
Secondly, during sample evaluation (arrows in magenta), L < K is recalculated dynamically by means of Equation (6). Pixels in each reservoir are projected onto the final vector basis u 1 , . . . , u L calculated with SKL, obtaining the reference points for each class in the L-dimensional vector space, {X 0,i }, . . . , {X m,i }. Once the n-sized reservoirs are represented in the updated feature space, a new incoming pixel a t can be processed. This pixel is also projected onto the low-dimensional vector space, and its likelihood of belonging to each class is estimated ( f (x t |H 0 ), . . . , f (x t |H m )). Classification follows suit, using Equation (13) (or, also, the MAP/ML classifier), obtaining its estimated tissue labell t .
These two steps describe, therefore, an on-line system that obtains a low-dimensional feature space that represents features in the whole dataset and, at the same time, the most frequent locations of these point clouds in this feature space are determined by the PDF estimates. A new pixel is most likely to belong to the class with the highest PDF value at its position in feature space, and therefore its assigned class will follow that rule.
General schematic description of the workflow and modes of operation implemented for the reservoir-based kernel density classifier. During learning (in blue) spectra-label pairs (a t , l t ) are used to update a feature space vector basis B = (u 1 , . . . , u L ) via the SKL algorithm and Equation (6), sorted by label and introduced into random reservoirs, where they have a chance of staying as reference spectra {r k,i }. During evaluation (in magenta), a new incoming spectrum a t is projected onto feature space, where it is compared with reference spectra. For that to be possible, reference spectra must also be projected onto B.
The likelihood of a t belonging to each tissue type is evaluated through KDE, and a final diagnosisl t is calculated.
For the results section, in order to evaluate the generalization capabilities of this method, we have enforced different approaches. In particular, we have evaluated our algorithm with 10-fold and 3-fold cross-validation, as well as leave-one-out cross-validation (LOOCV). In all cases, we seek to learn from patients instead of separate samples, in order to avoid biasing our algorithm with a priori information. Diagnoses are accounted in a per-sample basis in order to simulate a realistic clinical setting.
In leave-one-out cross-validation, for each of the 116 patients, reservoirs were emptied, SKL was reset to its initial condition, and sample acquisition was performed with the other 115 patients. Then, evaluation was performed on the patient under test, as if it were a new incoming patient that the system had not seen before. Its samples are evaluated separately and their diagnosis are tallied on the confusion matrix. Threshold parameters γ 0 , . . . , γm are to be evaluated as a function of its effect on the Receiver Operating Characteristic (ROC). A similar approach is enforced for 3-fold and 10-fold cross-validation.

Dimensionality reduction and vector basis
The first step in our process is the on-line extraction of a low-dimensional feature space. We must study the characteristics of the basis that describes such space, as well as how many dimensions are necessary to explain most of the features in it. The first five left-singular vectors that conform the space for the HSI melanoma dataset, as well as the cumulative sum of the square of the singular values of Σ in the SVD, are shown in Fig. 2. The first three basis vectors in feature space (those that represent an absorption peak around 550 nm and a slope in the red and infrared wavelengths) are the best feature descriptors for the problem at hand.
Although these feature basis vectors do not represent any isolated physical characteristics for the dataset, a change in slope in the 600-900 nm range characteristic of melanin, and the typical absorption peaks of hemoglobin around 530-600 nm can be seen [17,22]. It must also be noted that, since the SVD is a deterministic operation, the results and ordering of these feature vectors will be a result of the amount of data processed so far. These events are, therefore, the most repeated occurrences in all observed tissue spectra [45].
The low dimensionality required to represent melanocytic lesion spectra, as shown in the right subplot of Fig. 2, must be commented as well. Applying Equation (6) provides us with an L in the range of 3 to 10, depending on the degree of explained variance desired. For the aforementioned σ contrib ≤ 10 −3 criterion, L stays in the range L < 11. This confirms the fact that most of our data lies in a feature space of low dimensionality, as is common in hyperspectral images of biological tissues, and that using more dimensions will not benefit estimation nor classification significantly [9,14]. Figure 3 represents the positions in feature space of 4000 points per pathology, selected via Algorithm R in three distinct reservoirs, and pre-processed by the Standard Normal Variate algorithm (hence, laying on an L-dimensional sphere). In order to test the validity of our ROIs, we should at least compare the average spectra of each pathology with other state-of-the-art studies. Figure 4 displays the average spectra and standard deviation of nevi, melanoma and healthy skin, respectively. Average reflectance as well as standard deviations for each of the pathologies are within range of what has been shown in larger clinical trials [28,30] and in the articles that explain how the dataset was obtained [22,38,41].
A few things can be noted about the spectra. Melanoma spectra are notably different in both shape and absorption in certain wavelengths. The presence of melanin can be seen in the 600-700 nm range, whereas healthy skin shows lower absorption and variance. Note also the lack of absorption in the 400-500 nanometer range in melanoma, perhaps due to blue-white veil present in the images, as well as different concentrations of oxy-and deoxyhemoglobin [17]. Note, though, that the variability of melanoma -as shown by the scatter plots of Fig. 3-is more significant, and thus evaluating spectra with every spectrum in the dataset or with the average spectrum of melanoma turns practically unfeasible and prone to misdiagnosis. This can be explained by the fact that there will be non-cancerous spectra in feature space closer to the average spectrum of melanoma than to the average spectrum of nevi, and viceversa. Such a result suggests that estimating likelihood and class probability metrics in feature space will certainly provide a similarity function that holds true across all cases consequence of interpatient variability.

Segmentation and qualitative results
If we wish to diagnose a lesion by tissue type ratios, we must ensure lesion segmentation works in all the samples. We have chosen leave-one-out cross-validation as a first approach. For a given patient, every sample in the dataset except that patient was used to estimate the PDF function of each tissue type in feature space. Every pixel in each image will have a location in feature space, and therefore will be assigned a tissue class with weighted OVA (depending on the weights shown in Equation (13)). From now on, we will consider three tissue types or hypotheses: the Nevus, Melanoma and Skin classes will be referred to as H 0 , H 1 , and H 2 , respectively (skin-lesion segmentation must be performed automatically, hence why a 'Healthy Skin' reservoir is also considered). For a given sample, the PDF or likelihood values f (x|H 0 ), f (x|H 1 ) and f (x|H 2 ) can be calculated for each pixel. As shown in Equation (13), for classification there is an associated weight for each tissue class, namely γ 0 for Nevus, γ 1 for Melanoma and γ 2 for Skin spectra. This constitutes a three-parameter system with two degrees of freedom: that given by γ 1 in relation to γ 0 will allow us to compare benign with malignant melanocytic spectra, and that given by γ 2 with respect to any pigmented spectra (γ 1 and γ 0 ) will provide segmentation.
Since skin pigmentation will be dependent on patient skin fairness, we have chosen a dynamic version of the elbow method [50]. We shall begin with the maximum likelihood scenario, where γ 0 = γ 1 = 1. For any value of γ 2 , we can calculate the ratio where N les is the number of pixels classified as pigmented spectra (Nevus or Melanoma) that are obtained with Equation (13), and N tot al is the total number of pixels in the image. This ratio can take any value between 0 and 1, with ρ = 1 meaning all pixels are considered pigmented and ρ = 0 meaning no pixels in the image are pigmented. As γ 2 increases, more skin spectra will be classified as skin. The ratio will decrease up to a point of saturation, where ρ stabilizes for a finite interval of γ 2 and the lesion remains classified as such, and afterwards more and more pixels in the lesion are classified as skin until ρ = 0 and the lesion ROI vanishes. Similarly to the dynamic thresholding in Equation (6), we can calculate the finite-difference approximation of the second derivative of ρ with respect to γ 2 . As shown in Fig. 5, the second derivative of ρ(γ 2 ), ρ (γ 2 ), relates with the variation of its curvature. If the maximum value of ρ (γ 2 ) indicates where the elbow is, then stabilization will take place after the elbow, once ρ tends to zero again. Therefore, we can choose a relative dynamic threshold, such that the optimal segmentation value, γ 2,opt , is chosen by following and thus finding the end of the elbow in ρ(γ 2 ). Here, ρ max is the maximum of the second derivative. In other words, we choose a value for γ 2 where the curvature of the size of the lesion stabilizes up to 1/100 of its maximum variation. Since numerical variables usually have noisy values, ρ (γ 2 ) has been stabilized with a Savitzky-Golay filter (window size 11, order 7). This procedure is followed analogously in n-fold cross-validation. Once segmentation is achieved, in-lesion classification just requires separating pigmented pixels with their PDF values. Figures 6 and 7 are two examples of the results provided by the system. The SKL subroutine was defined with batch size P = 200, forget factor f = 1, and = 10 −5 . Reservoir size was 4000 per spectral class. Each figure contains an RGB reconstruction of the absorption spectra, using the CIE 1931 color matching functions (CMFs) and considering a D65 illuminant [51]. The final classification/segmentation results are shown in the right-side subplots of the first row of each example. Segmentation behaves generally well, for as long as there are no hairs in the image (hair has some degree of brown pigmentation, and is sometimes indistinguishable from with melanocytic tissue spectra). Note that this segmentation is obtained by directly comparing the three subplots in the bottom row. In most nevi cases, the melanoma hypothesis is illuminated within the melanocytic lesion, but its likelihood function is usually ten times lower than the nevus function -something which is not the case for malignant lesions, where the 'nevus' and 'skin' likelihood functions are zero and the melanoma likelihood function is larger. It must be noted that, given the higher variability of malignant spectra (as shown in Fig. 3), the PDF values for the Melanoma hypothesis are lower. This is an expected result, since KDE provides a PDF estimate and, by definition, the integral over all L-dimensional space must be equal to one. Nevertheless, the PDF values for the other hypotheses in the regions of feature space dominated by melanoma are zero or near zero, making the Melanoma class the dominant spectral signature. Another characteristic that is of notable interest is the fact that the algorithm tries -in some degree or another-to assign spectra of areas surrounding a melanoma as belonging to the nevi class. In such fringe areas the system simply assigns pixels to the class they are most likely to belong to, as constrained by the segmentation threshold. Given the lack of expert ROI information, is unknown if such regions would be pigmented, non-malignant areas; further research would be needed for a proper assesment of this response.

Diagnostic capability and ROC. Quantitative results
We will undertake now the study of the diagnostic capabilities of our algorithm, by evaluating its Receiver Operating Characteristic (ROC) and typical binary classification parameters. In order to describe a single sample as benign or malignant, a diagnostic rule must be established. For this scenario, given the fact that nevi pixels are rarely classified as malignant, the defined rule is a diagnostic ratio, calculated as follows: given a specific sample with a lesion of size N les (in pixels) containing N mal ≤ N les pixels within the lesion classified as melanoma, it will be diagnosed as malignant if N mal /N les ≥ p holds. In summary, pixels are classified following the weighted one-vs-all method (Equation (13)), while whole-sample diagnosis is achieved with diagnostic ratio p.
In order to test the capabilities of our system, we have evaluated classification accuracy for 3-fold and 10-fold cross-validation, as well as leave-one-out. For leave-one-out, Fig. 8 depicts both the ROC as well as overall system accuracy as a function of γ 1 , swept from γ 1 = 0 to γ 1 = 30, for several values of p: 0.01, 0.05, 0.1 and 0.2. As shown by this figure, the system can achieve stable accuracies up to 95%, depending on diagnostic criterion stringency, and it behaves like an almost-ideal radar detector. This implies, also, that lesion segmentation performs notably well for γ 0 = 1 and γ 2 = 1, given the fact that diagnosis is achieved by only counting pixels within the lesion itself. At peak performance for p = 0.05 (γ = 6.1224), the method is capable of reaching 96.8% sensitivity, 95.7% specificity, and 96.0% accuracy. For the same diagnostic criterion, a null false negative rate can be achieved at γ = 7.9592, with a false positive rate of about 10%. Such a tradeoff is to be expected, as false positives will increase in number as the false negative rate is minimized. Additional cross-validation tests can be found in Tables 2 and 3, with p = 0.05 and maximum accuracy and minimum false negative rate criteria, respectively. We must note that 3-fold cross-validation, due to small dataset size and reference spectra, has notably lower (16% less) accuracy and cannot achieve zero false negatives in the range γ 1 ∈ [0, 30]. This is to be expected, but it is kept in Table 3 for the sake of comparison. As is expected as well, a lower number of reference patients comes associated with lower accuracies, but still within range of state-of-the-art systems [28,32].
This figure also exhibits the change in classification performance as the diagnostic ratio increases. As is expected, the larger the value that this ratio takes, the larger γ 1 must be in order to classify the same amount of pixels as malignant and thus reach the same diagnosis. This relationship between γ 1 and p is empirical and therefore a conservative criterion was chosen: if a given ratio p is selected such that it contains the peak accuracy point for a value of γ 1 in the order of the other values {γ i }, then that p is adequate for classification, as long as the γ 1 that maximizes accuracy can be selected. Thus, p = 0.05 was selected as an adequate diagnostic ratio.
The diagnostic power of likelihood functions f 1 (x) and f 0 (x) can also be studied by obtaining their average values within each lesion. Then, relationships such as d(x) = f 1 (x) − f 0 (x) or f 1 / f 0 can be calculated for each lesion separately. The functions will be suitable for diagnostic classification if they present different values per category. The average values of f 1 (x) and f 0 (x) per sample were obtained, their difference calculated and portrayed in a boxplot (Fig. 9). Although the algorithm tends to provide larger values to the nevi category, d(x) presents clear separability per tissue class by at least one full standard deviation. Table 2. Overall system performance for different cross-validation (CV) settings (leave-oneout, 10-fold and 3-fold) given a diagnostic ratio of p = 0.05 and maximizing for accuracy. For 3-fold and 10-fold CV, half the standard deviation of each parameter is also indicated.

Method
Optimal γ 1 Sensitivity at optimal γ 1 Specificity at optimal γ 1 Precision at optimal γ 1 Accuracy at optimal γ 1 LOOCV  Table 3. Overall system performance for different cross-validation (CV) settings (leave-oneout, 10-fold and 3-fold) given a diagnostic ratio of p = 0.05, and minimizing false negative rates (FNR). For 3-fold and 10-fold CV, half the standard deviation of each parameter is also indicated.

Performance as a function of reservoir size
It is clear, given Equation (7), that the presented methodology is highly dependent on how good the PDF approximations are. As with all interpolation procedures, the quality of such approximation will be a function of the number of pixels used in it. Although it has been shown in previous work that KDE can interpolate with few pixels with notable accuracy [14], we must study the influence of PDF estimation on overall classification performance. For Figure 10, the effect of reservoir size on diagnostic criteria and PDF estimate fidelity was put to the test, in the following procedure. Leave-one-out cross validation was repeated on a fifth of the samples (i.e. Samples 1, 6, 11, 16, ..., up to Sample 116). First, ten simulations with a reservoir size per hypothesis of 200 (as depicted by Fig. 1) obtained the PDF estimates of each sample using leave-one-out on this reduced dataset, as in previous sections. This was then repeated for other reservoir sizes, namely 500, 1000, 2000 and 4000.
Each of the 10 simulations provide a different PDF estimate of each sample. For the leftmost subplot of Fig. 10, we calculated the variance in average PDF value among the 10 simulations in each pixel, then calculated and then averaged for all pixels and all 25 samples. Such procedure is then repeated for all the other reservoir sizes. This provides us a proof that the PDF estimates, although obtained by means of a random reservoir, will vary on average in the order of 10 −4 for benign tissue and 10 −8 for malignant tissue, depending on the reference pixels used, which is rather negligible (< 1%) for PDF estimate values in the range [0.02, 2].
All 10 PDF estimates per sample are then classified following the accuracy-maximization criterion -weighted one-versus-all, γ 1 = 6.114, p = 0.05 (5%)-. Segmentation parameters were kept as specified in Section 3.2. For each simulation we classified, as indicated in Section 3.3, all pixels in each sample, and obtained the number of lesion pixels N les and the number of malignant pixels N mal as well. A sample will be classified as malignant if N mal /N les ≥ p, and therefore the variance of this ratio amongst simulations must be as close to zero as possible in order for this method to be valid. Variance in N mal /N les present among simulations was calculated and plotted, per sample, in the right and center subplots of Fig. 10. This variance is worse, as expected, between simulation runs with smaller reservoirs, although it remains well under 10 −2 for reservoirs of size greater than 1000 in most samples. This, in turn, means that only a small number of pixels per lesion are differently classified in different simulations, which empirically demonstrates that both the PDF estimates provided by KDE and the defined diagnostic criterion are resilient to changes in the reference population, as long as such population is sufficiently large and accurately represents each pathology in feature space.

Conclusions
In this work, the ability of reservoir-based KDE to segment and classify hyperspectral images while still correcting for interpatient and intersample variability has been described and empirically tested, and a simple protocol inspired by the Neyman-Pearson lemma has been described. The system can be configured to maximize accuracy, obtaining a nominal sensitivity, specificity and accuracy of 96.8%, 95.7% and 96.0%, respectively, on a hyperspectral skin lesion dataset with 116 patients. It can also minimize false negative rates, obtaining a minimum false positive rate of 9.5%. These statistics are either comparable or an improvement over other state-of-the-art algorithms for spectral images of nevi [22,25,27,31,33,38,40], while its theoretical background simplifies its usage, as it has but one degree of freedom in its operation. In order to avoid complicating the clinical translation of this methodology, we seek to adapt the method to the dermatologists' needs. Modifying classification parameter γ 1 provides for now two modes of operation, which allows the algorithm to work in different scenarios: • If we let γ 1 take the optimal values of Table 2, that is, the value that maximizes accuracy for our diagnostic criterion, we would be operating at optimal performance, at the maximum true positive rate and minimum false negative rate that is achievable simultaneously.
• Finally, if we used γ 1 as shown in Table 3, we would be operating in a zero-false-negativerate regime, which would perform in a greedier fashion, at the expense of not misdetecting any melanomas.
A notable difference is that, while classification results are well within those of state-of-the-art machine learning systems, the procedure by which it takes place does not require training per se. Instead, it estimates the probability distribution of each pathology in a feature space obtained by reference data. No hidden function is calculated and no parameter learning takes place. Segmentation threshold γ 2 is obtained dynamically in each lesion to achieve automatic segmentation and the malignant detection threshold parameter γ 1 could be either ignored (γ 1 = 1), or set to the aforementioned 'maximum accuracy' or 'minimum false negative rate' operation modes, which could be recalculated as new information arrives to the system.
It is fundamental to point also that this method could be used in other classification problems with high dimensionality, even those presenting noise. As long as the magnitude of the noise does not exceed significantly that of the features that differentiate classes among themselves -which would make separability impossible in any case-, KDE would provide a smooth PDF estimate, and classification by thresholding could be performed. Thus, applications of this algorithm include, but are not limited to: on-line tissue segmentation during surgery; detection of the presence of fluorescent markers and spurious artifacts; textural analysis and classification of superficial lesions.
In a clinical setting, imaging a single lesion takes about 20-30 minutes. Learning and diagnosis without methodology requires less than a minute (from milliseconds up to a second for the diagnostic step, if the algorithm has been prepared in advance), which is acceptable in the diagnosis of suspicious lesions [22]. Ideally, the rise of new modulated imaging systems that can provide depth-resolved absorption and scattering information, such as those utilized recently in melanocytic specimens [52], could also obtain with this method three-dimensional probability maps of the hyperspectral voxels they are capable of measuring. That could provide surgically relevant information in many subfields of oncological surgery, such as lumpectomy surgery -e.g. providing the depth of the surgical margins in any lesion-and, in the field of dermatology, provide an estimate of the Breslow depth of any skin lesion.
For now, given the exhibited high speed, accuracy, and segmentation capabilities of this algorithm, the time required for proper diagnosis will be only limited by the imaging system, and thus a first clinical trial could already take place, one in which the proposed methodology is employed alongside already-existing MSI/HSI cameras and systems such as MelaFind [20], providing a diagnostic assessment that minimizes false negatives and outputs the overall likelihood of malignancy of any melanocytic lesion. This information could be used in secondary prevention, and assist general practitioners and expert dermatologists in their mighty endeavor.

Disclosures
The authors declare that there are no conflicts of interest related to this article.