Raman active components of skin cancer skin cancer diagnosis.

, Abstract: Raman spectroscopy (RS) has shown great potential in noninvasive cancer screening. Statistically based algorithms, such as principal component analysis, are commonly employed to provide tissue classification; however, they are difficult to relate to the chemical and morphological basis of the spectroscopic features and underlying disease. As a result, we propose the first Raman biophysical model applied to in vivo skin cancer screening data. We expand upon previous models by utilizing in situ skin constituents as the building blocks, and validate the model using previous clinical screening data collected from a Raman optical fiber probe. We built an 830nm confocal Raman microscope integrated with a confocal laser-scanning microscope. Raman imaging was performed on skin sections spanning various disease states, and multivariate curve resolution (MCR) analysis was used to resolve the Raman spectra of individual in situ skin constituents. The basis spectra of the most relevant skin constituents were combined linearly to fit in vivo human skin spectra. Our results suggest collagen, elastin, keratin, cell nucleus, triolein, ceramide, melanin and water are the most important model components. We make available for download (see supplemental information) a database of Raman spectra for these eight components for others to use as a reference. Our model reveals the biochemical and structural makeup of normal, nonmelanoma and melanoma skin cancers, and precancers and paves the way for future development of this approach to noninvasive


Introduction
As the most common type of malignancy, skin cancer accounts for over 5.4 million cases and 10K deaths per year in the US alone [1]. At present, the clinical diagnosis of skin cancer relies on visual inspection of suspicious lesions followed by biopsy and histopathology. The biopsies are performed in a dermatologist's office and then sent to the histopathology lab for further examination. A few additional days may be needed to deliver the final results. These biopsy procedures are invasive, inefficient, and inconvenient. More importantly, the process has low diagnostic accuracy (49-81% among dermatologists for melanoma, the most deadly form [2]). The number of pigmented lesions needed to be excised to identify one melanoma ranges from 22 to 59 for experienced versus new general practitioners [3]. Large numbers of biopsies are performed on benign skin, leading to a substantial financial burden to the healthcare system and the patient, alone with considerable patient discomfort. As a result, a critical need exists to develop a noninvasive, accurate, fast, and inexpensive method for early skin cancer screening.
Raman spectroscopy (RS) is a noninvasive optical technique sensitive to the molecular composition of biological tissues. Recent advances in near-infrared lasers, optical filters, fiber optics and CCD cameras have greatly improved its sensitivity in detecting the chemical composition of biological tissues. In the most recent decade, the Raman optical fiber probe has allowed for fast and accurate cancer diagnostics, including ex vivo study of breast [4,5], prostate [6], lung [7] and skin [8,9], and in vivo study of breast [10], cervical [11], and skin [2,12]. Recent clinical studies from our group [12] and others [2] have demonstrated that RS has high diagnostic accuracy in discriminating skin melanoma from nonmelanoma pigmented lesions.
To date, most fiber probe-based approaches have used statistical algorithms to describe the spectral differences of RS data, such as principal component analysis (PCA) [13] and independent component analysis (ICA) [14]. However, the principal or independent components are difficult to relate to the biophysical origin of disease, such as the microstructural organization of proteins and lipids and the functional state of cellular metabolism. These microstructural changes are what pathologists and dermatologists use to make diagnostic decisions and decide on the most appropriate treatment [10].
As a result, we describe for the first time a Raman "biophysical model" of human skin. This inverse model derives the morphological and biochemical composition of skin tissue from its Raman spectrum. The building blocks of our model are Raman active components extracted from skin in situ. Previous biophysical models used model components either measured directly from synthetic/purified chemicals [9,[15][16][17][18], or extracted from tissue sections in situ [19][20][21]. The advantage of using synthetic/purified chemicals as model components is that they can be easily measured without the need for Raman micro-imaging. This concept has been applied successfully to a previous Raman model of excisional skin biopsies [9] and has given us some prior knowledge about skin composition. Other groups used in situ constituents to build biophysical models of other disease processes, such as coronary atherosclerosis [19], breast cancer [20], and brain tumor [21]. Here, we expand upon this approach by developing a biophysical human skin model using in situ skin constituents. In situ constituents better represent the milieu of biological tissues that cannot be recapitulated in a synthetic environment. In human skin many constituents are present in various forms and each has a slightly different Raman spectrum. For instance, both collagen type I and III are abundant in human dermis; however, if both of them are included in the model, it may lead to overfitting and unstable results. In addition, skin constituents synthesized in the lab or from commercial sources are not in their natural states as in the human skin. For instance, the Raman features of a protein may change when it is exposed to organic solvents during synthesis. As a result, a single spectrum of skin constituent extracted from its microenvironment can provide a more general picture of the biophysical origins of skin spanning normal and abnormal disease states.
Furthermore, we applied the model for the first time to in vivo human skin cancer screening data that covers a wide range of normal, nonmelanoma and melanoma skin cancers and precancers [12]. Previous Raman models of human skin were either built for ex vivo tissue specimens [9] or in vivo normal skin [22]; however, Raman biophysical models have not been applied to in vivo skin cancer screening to interpret biophysical changes between pathologies.
In this study, we performed Raman imaging of tissue sections using a custom confocal Raman microscope. Multivariate curve resolution (MCR) analysis [23] was used to resolve in situ skin constituents from Raman images. Our results suggest that eight skin constituents are the most relevant building blocks, illustrating some variances with their corresponding synthetic components. The basis spectra of those skin constituents were then combined linearly to describe in vivo human skin spectra. The fit coefficients provided insight into the biochemical and structural composition of normal, benign and malignant skin tissues. Our model revealed the most important skin constituents representing the spectral features of skin tissues, and provided significant guidance to develop diagnostic algorithms for real-time noninvasive skin cancer diagnosis in future.

Raman instrumentation
Raman images were collected with a custom built confocal Raman microscope illustrated in Fig. 1. The system was also integrated with a confocal laser-scanning microscope (CLSM) and a bright-field microscope, which provided the morphology image for assisting in locating the region of interest for Raman micro-imaging. The Raman excitation source was an 830nm single mode diode laser (LM830-PLR200, Ondax). The laser beam was reshaped, expanded and delivered to the sample through a microscope objective (Olympus, NA = 1.2, 60x). The galvanometer mirror performed 2D raster scanning on the tissue. The backscattered Raman signal was collected by a spectrograph (f/1.8i, Kaiser) and a deep cooling CCD camera (IDUS, Andor) through an optical fiber (50 μm, NA = 0.22), which also acted as a pinhole. The Rayleigh scattering light was collected by a PMT (C10709, Hamamatsu), and amplified by a current preamplifier (SR570, Stanford Research Systems). A data acquisition board (PCIe-6351, National Instruments) and LabVIEW software (National Instruments) were used to control the system. The power delivered to the sample was approximately 45mW. Lateral resolution was measured from the FWHM (full width at half maximum) of the point spread function using 0.2 and 0.5 μm microbeads. Axial resolution was measured from the FWHM of the intensity profile by translating a mirror towards the objective. The lateral, axial and spectral resolution of the system was around 1 μm, 8 μm and 8 cm −1 , respectively.

Tissue preparation and Raman micro-imaging
Our study was approved by the Institutional Review Board at The University of Texas at Austin and Seton Medical Center. Fresh frozen human skin tissue samples were acquired from biopsy specimens during routine skin cancer surgery at Austin Dermatologic Surgery Center. After being transferred to the lab, the samples were stored at −80C. They were then thawed to −22C in a cryostat and sliced into 10 μm thin sections. The sections were transferred to magnesium fluoride slides (Edmund Optics) for the Raman measurement, and serial sections were transferred on standard microscope slides for hematoxylin and eosin (H&E) staining. Prior to Raman imaging, the sections were warmed to room temperature and kept moisturized with 0.9% saline solution.
Next, we performed Raman imaging on skin sections. Typical integration time at each pixel location was 2s. Typical step size in both the x and y directions was 1μm, but sometimes to achieve a large of view a step size of 5 μm was used. Imaging area varied from 30 × 30 μm 2 to 150 × 150 μm 2 . We then correlated the Raman image with the histopathology image of the serial stained section. A board certified dermatologist assisted in identifying and confirming the morphology and biochemical components measured. In total, we collected more than 40 Raman images from samples of different disease states, including 24 images from 11 basal cell carcinoma (BCC) patients, 15 images from 5 squamous cell carcinoma (SCC) patients and 4 images from 1 malignant melanoma (MM) patient.

Data preprocessing and MCR analysis
Raman data preprocessing was performed using MATLAB (R2015b, MathWorks). All spectra underwent wavelength calibration, background subtraction, cosmic ray removal and smoothing. The system spectral response was calibrated using a tungsten halogen lamp (LS-1-CAL, Ocean Optics). The fluorescence background was then removed by modifying a 5th order polynomial fitting routine [24]. The effective spectral range was 800 to 1800 cm −1 .
A multivariate curve resolution (MCR) method was employed to resolve individual morphological or biochemical components from the Raman image. This method has been successfully applied to stimulated Raman imaging data by Zhang et al. [25]. The basic concept of MCR is to decompose the raw spectra matrix D (unfolded from Raman imaging) into the product of two smaller matrices C and T S by a bilinear model: T S corresponds to the matrix of the pure spectra, C is the related concentration profiles for each of the components and E is the error matrix. As an unsupervised learning method, the number of components contributing to D was determined either by prior knowledge or by assessing the results obtained using singular value decomposition (SVD). After initial estimation is given for T S , the C and T S are optimized iteratively using an alternative leastsquares algorithm (ALS) until convergence is reached.
Here, we used a MATLAB based MCR-ALS toolbox [23] to determine C and T S . The initial estimates of C and T S were determined by means of a purest variable detection method [26]. The basic idea is to resolve highly overlapping near-infrared spectra with baseline problems by using the second-derivative spectra [26]. A nonnegative constraint and a 10% of tolerance were added to the ALS optimization. The concentration images for each individual component were reconstructed from C , and the corresponding basis spectra were obtained from T S . We then categorized the basis spectra according to their biochemical or structural origin, such as elastic fibers, collagen fibers and cell nucleus.
We obtained a library of basis spectra from various skin sections spanning normal skin, and various skin disease states. The spectra in the same category were then averaged to create a single basis spectrum to represent that biochemical or structure. Although the basis spectra collected from different patients had minor differences, after averaging spectra from many patients we could ensure that the inter-patient variation was minimized.

Clinical screening data description
In vivo human skin spectra came from our previous skin cancer screening study [12]. Data were collected from an optical fiber probe [27] integrated in a multimodal spectroscopy system [28] on different sites, such as scalp, nose, earlobe, shoulder and thigh. Lesion types including basal cell carcinoma (BCC, 19 lesions), squamous cell carcinoma (SCC, 38 lesions), actinic keratosis (AK, 14 lesions), benign pigmented lesion (PL, 17 lesions) and malignant melanoma (MM, 12 lesions). BCC and SCC are the most common types of nonmelanoma skin cancers, whereas AK and PL are the most common precancers of SCC and MM, respectively. Raman spectra of adjacent normal skin for each individual lesion were also collected. Although normal skin measurements were not verified by histopathology, they were visually verified to be normal by an experienced dermatologist/physician assistant.

Model establishment
A sample's Raman spectrum can be represented as a linear combination of the Raman spectra of the sample's individual constituents. The signal intensity is then proportional to the chemical concentration [29]. Therefore, if one knows the spectra of the basis tissue constituents a priori, one can determine the concentration of those basis constituents. We used linear least-squares fitting with a nonnegative restraint for model fitting, according to the following equation: while X is the sample's spectrum (in vivo human skin spectrum). s is the spectra matrix of the sample's individual constituents. c is the relative spectral contribution (fit parameter) predicted by the model. e is the noise related with the clinical RS system. Next, a combination of forward selection and backward elimination methods was performed to derive the most relevant basis constituents to the spectroscopic model. Finally, after applying the model to all the in vivo human skin data, we could obtain the biochemical and structural makeup of tissues spanning normal and various disease states.
One important factor that may influence the performance of the model is collinearity of the basis spectra. Collinearity is a common issue in linear regression that may lead to an unstable result [30]. The following equation is to calculate the collinearity coefficients between two basis spectra, x and y : A value of 0 means the two basis spectra x and y do not have collinearity, and 1 indicates the two vectors are the same. This equation was used for the initial evaluation of the model components.

Cellular components
To identify cellular tumor components, Raman micro-imaging was performed within a tumor cluster in a BCC section (Fig. 2). Using MCR analysis, we reconstructed three concentration images (Fig. 2a-c) corresponding to cell nucleus, cell cytoplasm, and the Raman substrate. Those structures correlated well with the bright field, CLSM, and histopathology images ( Fig.  2d-f), and their Raman spectra had similar characteristic peaks with the known spectra measured from the pure chemicals (Fig. 2). This approach was used to resolve the other skin constituents in the following sections as well. As seen from the plots on the right, the basis spectra of in situ nucleus and synthetic DNA (Sigma-Aldrich) are similar, which both have the pronounced contribution from phosphodioxy group PO 2 −1 at 1093 cm −1 . However, the difference spectrum shows that in situ nucleus has substantial differences from synthetic DNA. For example, in situ nucleus appears to have a higher contribution from DNA backbone at 835 cm −1 [31]. The spectra of in situ cytoplasm and synthetic actin also have high similarity, but major differences can be found at 1003 cm −1 phenylalanine peak, 1081 and 1092 cm −1 lipid band. Numerous other peaks can also be appreciated in the difference spectra. These differences indicate the morphologically derived basis spectra of nucleus and cytoplasm include features related to other elements found in the cell.

Epidermal ECM
The epidermal layer of skin provides the barrier to water permeation and abrasion resistance. It is produced by continuous cell division of keratinocytes in the basal layer. Ultimately, the keratinocytes cornify and produce the stratum corneum, which is the dead, flattened cells at the outermost layer of the skin [32]. Because keratin is the main chemical component of epidermal ECM, we use in situ keratin to represent epidermal ECM. Figure 3 illustrates Raman imaging performed on epidermis from a normal skin section. Tissue architecture correlates well with the histopathology image. The concentration images of epidermal ECM and the Raman substrate were reconstructed using MCR analysis. The Raman spectra of in situ and synthetic keratin are similar with substantial differences found at the protein bands at 855, 1318 and 1409 cm −1 [33].

Dermal ECM
Dermal ECM comprises fibrillar collagens and associated proteins. Collagen fibers account for about 70% of the weight of dry dermis, while elastin maintains skin elasticity through a durable cross-linked array. Large diameter elastin-rich elastic fibers reside in the reticular dermis [34]. Figure 4 illustrates Raman imaging performed on a BCC skin section to extract dermal ECM proteins. The in situ collagen (collagen fiber), in situ elastin (elastic fiber), dye, and Raman substrate were resolved from the image by MCR analysis. The thin blue-gray elastic fibers and the pink collagen fibers can be identified from the histopathology image. The plots on the right compares the Raman spectrum of in situ collagen with synthetic type I and III collagen. Major differences are found at 856, 1248 and 1665 cm −1 protein bands between in situ and type I collagen, and 1157 and 1514 cm −1 between in situ and type III collagen. In situ and synthetic elastin have very similar spectra, which indicates that elastin is the major chemical component of elastic fibers. Fig. 4. Extracting dermal components from a BCC skin section. In situ collagen (a) and elastin (b) are resolved from the image. The dye used by the dermatologist to mark the orientation of the tissue was also detected (c). Raman images are compared with the bright-field image (d), CLSM image (e) and histopathology image (f). The box on (e) marks the location of Raman imaging. The arrow in (f) points to a thin blue-gray elastic fiber. Plots on the right displays Raman spectrum of in situ collagen, synthetic collagen and the difference spectrum. Also Raman spectrum of in situ elastin, synthetic elastin, and their difference spectrum. Scale bar: 10 μm.

Lipids
Skin's epidermal surface is comprised of sebaceous and stratum corneum lipids. Epidermal lipids act like a cement to fill the spaces between the cells. The major constituents of sebaceous lipids are triglycerides (triolein), wax esters and squalene, while the epidermal lipids are a mixture of ceramides, free fatty acids and cholesterol [35,36]. Ceramide is an important epidermal surface lipid as it composes almost half of the SC lipids [32].
Raman imaging was also performed to derive the basis spectra of lipids. Figure 5 illustrates extracting in situ ceramide and triolein within a hair follicle from a SCC skin section. The synthetic spectra are not shown because they look similar to in situ spectra. Instead, we compare the difference spectra between in situ lipids. Although in situ ceramide and palmitic acid look similar, they have different spectral intensity in C-C stretching mode at 1063 and 1128 cm −1 , CH 2 twisting mode at 1296 cm −1 , CH2 bending mode at 1440 cm −1 and C = C stretching mode at 1656 cm −1 . Larger variance was observed in those bands between in situ ceramide and triolein. Triolein is not only abundant in skin lipid, but also in subcutaneous fat [20]. As triolein has a very strong Raman scattering cross-section, it contributes greatly to Raman spectrum of human skin.

Pigments
Skin pigments include melanin and beta carotene. Melanin is produced by melanocytes in the basal layer of the epidermis. In Fig. 6, we identified melanin from a MM skin section. As expected, melanin provides strong contrast in CLSM image [37]. We lowered the laser excitation to 20mW to reduce tissue burning caused by strong absorption of melanin. As this led to a worsening in the SNR, we further smoothed the melanin spectrum by fitting it to Gaussian functions [38]. The two broad peaks located at 1378 cm −1 and 1573 cm −1 were consistent with the spectrum of in vivo cutaneous melanin [38]. Beta carotene is a plantderived carotenoid. It was extracted from skin sections adjacent to fatty tissue. The characteristic peaks of beta carotene at 1008, 1156 and 1515 cm −1 are consistent with a previous study [39].

Miscellaneous
In Raman imaging, water came from the saline used to keep the skin section moist. We found water plays an important role in fitting the broad Raman band at 1645 cm −1 . Hemoglobin and calcium hydroxyapatite (CaH) were only detected in one skin section but were included in our library. Morphologies such as hair follicle (HF) and keratin pearl (KP) were also obtained. KP was extracted from SCC lesions with acceleration of keratinization. Figure 7 shows the basis spectra of these constituents. Although the spectra of HF and KP are similar, the difference spectrum suggested that the former contained cellular information (DNA backbone at 835 cm −1 and phosphodioxy group PO 2 −1 at 1093 cm −1 ). Finally, we included a spectrum of fiber background generated from the Raman optical fiber probe. This component is used to fit the broad peak between 1000 -1100 cm −1 in the in vivo data. Fig. 7. Basis Raman spectra of water, calcium hydroxyapatite (CaH), hemoglobin (Hb), hair follicle (HF) and keratin pearl (KP) collected in situ are displayed. The difference spectrum between HF and KP is also shown.

Biophysical modeling results
A total of fifteen candidate model components were derived. The basis spectra were peak normalized with a minimum value of 0 and maximum value of 1. Their collinearity coefficients are displayed in Table 1. Beta carotene and calcium hydroxyapatite are not shown because they have low collinearity (< 0.50) with other components. Several components have high collinearity, such as in situ keratin in epidermis, keratin pearl (KP), and hair follicle (HF), likely because keratin dominates their chemical composition. As a result, we selected only one model component to represent keratin. In addition, we observed high collinearity between cell cytoplasm (Cyt) and other protein-rich components (elastin (Ela), keratin, KP, and HF). These components share common features of many functional groups, such as C-C stretching around 939 cm −1 , Amide III around 1270 cm −1 , CH modes around 1454 cm −1 and amide I around 1660 cm −1 . Considering that (1) cell cytoplasm has a much smaller Raman scattering cross-section and less quantity than keratin, and (2) the spectrum of keratin may contain some cell features due to their close proximity, we finally excluded cytoplasm from our model. In addition, Raman spectrum of palmitic acid (PA) has a high degree of overlap with triolein (0.94) and ceramide (0.96), so PA was also excluded from the model. In total, we arrive at eight primary Raman active components: collagen, elastin, triolein, cell nucleus, keratin, ceramide, melanin and water (Fig. 8). The peak positions of main Raman bands are displayed in Table 2.   In an effort to validate that these eight components captured the primary skin constituents as measured on in vivo human skin cancers, we fit this linear component model to the clinical data set. We determined the relative contribution from the eight model components of each of the pathology groups. Figure 9 shows the fitting result of the mean Raman spectra. Considering the order of magnitude of the residuals with respect to the bulk tissue spectra, most of the spectroscopic features are well represented. The fit coefficients across each model component are normalized to sum to 1.

Discussion
In this study, we establish a Raman "biophysical model", an inverse model for determining biophysical skin components using in vivo Raman spectroscopy. We built a confocal Raman microscope to identify eight of the most relevant skin constituents contributing to the spectral differences among different skin malignancies. Our model components were found to be consistent with previous studies. Some were commonly used in skin and non-skin models. For instance, collagen and triolein are known to be important contributors to the RS signal of breast, gastric, and artery tissues [17,19,20]. We demonstrated these two components also played an important role for fitting in vivo skin data. Other components were more specific to skin. For instance, Caspers et al. used ceramide to model epidermal lipid in human stratum corneum layer [40]. Silveira et al. included elastin to model skin dermal protein [9]. Keratin was important for in vivo skin to consider the impact of epidermis [22,40] but not necessary for excisional skin fragments because the measurement was on the dermis side [9]. Melanin was important only when pigmented lesions were considered, so it was used to model melanoma skin tissue [9]. However, our model is different from previous biophysical Raman skin models in the following two aspects. First, we used skin constituents in their microenvironment as the basis spectra. Our results showed that it was possible to use a single morphologically derived basis spectrum rather than synthetic/purified chemicals. As demonstrated in the Results section, in situ skin constituents had substantial differences from their corresponding synthetic chemicals, even if their major chemical components were the same. Since in situ constituents are extracted from skin in their natural state, without any further processing, they can better represent the skin microenvironment that cannot be recapitulated in a synthetic environment. Second, our model was validated by a previous in vivo clinical screening study [12] acquired by a Raman optical fiber probe [27]. Currently, the only biophysical Raman skin cancer model was based on excised fragments of BCC and melanoma skin tissues [9]. We expanded upon this research to apply our model to in vivo skin spectra study and covered a wider range of nonmelanoma and melanoma skin cancers and precancers.
While we found a total of 15 measurable Raman components in skin, we found the most consistent model outcomes were achieved when minimizing this number to only eight components. Our approach was to select only one Raman constituent to represent those in situ components that were chemically similar and with high collinearity. Similarly, Stone et al. demonstrated that including both amino acids and the proteins containing them in a linear model skewed the fit coefficients [16]. Our experience is that minimizing the number of protein components resulted in the most consistent fit coefficients.
The biophysical changes of skin derived by our model follow known morphological and biochemical changes in skin malignancy. We observed that there is less triolein in skin cancer/pre-cancer lesions relative to the amount of triolein in normal skin. Triolein is a major form of triglyceride in human skin, which presents as subcutaneous fat and epidermal surface lipid. The apparent decrease in triolein as cancer progresses could be due to: (1) the reduction of subcutaneous fat sampled by the probe, caused by the thickening epidermis during lesion formation; and/or (2) the reduction of membrane lipid synthesis induced by UV damage [34]. Because subcutaneous fat exists in a substantial amount and has large Raman scattering cross section [41] we believe (1) is the major reason. The thickening of epidermis originates from the progression of malignancy [42].
While both melanoma and nonmelanoma skin cancers were included in this study, the direct comparison between BCC or SCC and MM is much less clinically relevant [43]. Thus, we compared nonmelanoma (BCC, SCC, AK) versus normal skin and MM versus PL separately. We observed that the amount of collagen was substantially lower in nonmelanoma skin cancer lesions as compared to normal skin. This could be explained by the breakdown of collagen in dermis due to the role metalloproteinases (MMP) play in degrading collagen and prohibiting procollagen biosynthesis [35]. The thickening of the epidermis also leads to reduced collagen signal collected by the probe. Furthermore, we observed the amount of elastin was higher in BCC lesions as compared to normal skin, potentially resulting from the existence of solar elastosis. Elastosis is characterized by the accumulation of disorganized elastic fibers in the dermis and commonly found in photoaging skin [36]. Finally, we found keratin was substantially higher in SCC compared to the other groups, which suggests massive keratinization disorders during SCC tumor progression [37].
By visual inspection, the mean spectra of MM and PL appear very different than the mean spectra of other pathologies. The spectral flattening between 1500 and 1700 cm −1 is caused by increased melanin and pigmentation, indicating RS is sensitive to pigment-related variations. However, discriminating MM from PL remains the most challenging discrimination in skin cancer screening, resulting in high negative biopsy ratios clinically. In our study, we observed melanin content in MM is substantially higher as compared to PL, indicating massive melanocyte proliferation. The significantly lower level of triolein in MM than PL could be explained by both the reasons given above and by the strong absorption of melanin, which further reduced the signal sampled from subcutaneous fat. In addition, collagen is substantially lower in MM than PL. This suggests that tumor formation is closely related to the changes in its stroma microenvironment in favor of its proliferation and eventual metastasis [37,38]. Our model demonstrated that collagen, triolein, and melanin are the most important cancer identifiers for MM. Future work will explore the diagnostic potential of these biophysical parameters in discriminating skin cancers.
We observed a higher fitting residual in MM than the other tissue types. The basic assumption of our linear fitting model is that the scattering properties of tissue do not significantly distort the Raman spectrum [29], but this assumption may not hold for melanin due to its strong absorption and scattering. Intrinsic Raman spectroscopy may help correct this distortion by relating the observed and intrinsic Raman spectra through diffuse reflectance using light transport model [44]. We will also explore nonlinear fitting models to improve the fitting, such as partial least-squares (PLS) and support vector machine (SVM). Other factors also contribute to the residuals in general. One factor is that the basis spectra and bulk tissue spectra were acquired from two independent Raman systems, which were composed of different detectors, lenses, beam splitters, etc. Spectral response calibration was used to match the spectral response of the two systems, but it could not completely eliminate the differences in the spectra measured by the two systems. Another factor is the signal generated by probe components, such as the fiber background, epoxy and sapphire [27].
In general, we did not find site-specific constituents that are not covered by the current model, but the concentration of the 8 components may vary due to location. For example, when the measurement was taken on the scalp surround with dark hairs we would detect melanin signal. Future work will examine how sensitive our model is in picking up such information.
In this study, we proposed the first Raman biophysical model that used in situ Raman active components as the building blocks, and applied to in vivo skin cancer screening data.
Our results indicate that eight basis spectra derived from collagen, elastin, triolein, cell nucleus, keratin, ceramide, melanin, and water are the most relevant to describe the spectral features of human skin RS data. We make available for download (see Data File 1) a database of Raman spectra for these eight components for others to use as a reference. Our future work will evaluate the performance of this model in discriminating skin cancer pathologies within the context of ongoing clinical studies of Raman spectroscopy for skin cancer screening in our group. We environ our model being used with the Raman probe for analyzing individual lesions pointed out by patents or providers. We think it would be reasonable to scan the top ten concerning lesions on any patients without affecting the current patient flow in a physician's office.

Funding
Cancer Prevention and Research Institute of Texas (CPRIT) RP130702.