Model based and empirical spectral analysis for the diagnosis of breast cancer

We explored the use of both empirical (Partial Least Squares, PLS) and Monte Carlo model based approaches for the analysis of fluorescence and diffuse reflectance spectra measured ex vivo from freshly excised breast tissues and for the diagnosis of breast cancer. Features extracted using both approaches, i.e. principal components (PCs) obtained from empirical analysis or tissue properties obtained from model based analysis, displayed statistically significant difference between malignant and non-malignant tissues, and can be used to discriminate breast malignancy with comparable sensitivity and specificity of up to 90%. The PC scores of a subset of PCs also displayed significant correlation with the tissue properties extracted from the model based analysis, suggesting both approaches likely probe the same sources of contrast in the tissue spectra that discriminate between malignant and non-malignant breast tissues but in different ways. ©2008 Optical Society of America OCIS codes: (170.0170) Medical optics and biotechnology; (300.0300) Spectroscopy; (170.4580) Optical diagnostics for medicine; (170.6510) Spectroscopy, tissue diagnostics. References and links 1. C. Zhu, G. M. Palmer, T. M. Breslin, F. Xu, and N. Ramanujam, "The Use of a Multi-separation Fiber Optic Probe for the Optical Diagnosis of Breast Cancer," J. Biomed. Opt. 10, 024032 (2005). 2. C. Zhu, G. M. Palmer, T. M. Breslin, J. Harter, and N. Ramanujam, "Diagnosis of Breast Cancer using Fluorescence and Diffuse Reflectance Spectroscopy: a Monte Carlo Model Based Approach," J. Biomed. Opt. 13, 034015 (2008). 3. C. Zhu, G. M. Palmer, T. M. Breslin, J. Harter, and N. Ramanujam, "Diagnosis of breast cancer using diffuse reflectance spectroscopy: Comparison of a Monte Carlo versus partial least squares analysis based feature extraction technique," Lasers Surg. Med. 38, 714-724 (2006). 4. Y. Yang, A. Katz, E. J. Celmer, M. Zurawska-Szczepaniak, and R. R. Alfano, "Fundamental differences of excitation spectrum between malignant and benign breast tissues," Photochem. Photobiol. 66, 518-522 (1997). 5. Y. Yang, A. Katz, E. J. Celmer, M. Zurawska Szczepaniak, and R. R. Alfano, "Optical spectroscopy of benign and malignant breast tissues," Lasers in the Life Sciences 7, 115-127 (1996). 6. Y. Yang, E. J. Celmer, M. Zurawska Szczepaniak, and R. R. Alfano, "Excitation spectrum of malignant and benign breast tissues: a potential optical biopsy approach," Lasers in the Life Sciences 7, 249-265 (1997). 7. G. M. Palmer, C. Zhu, T. M. Breslin, F. Xu, K. W. Gilchrist, and N. Ramanujam, "Comparison of multiexcitation fluorescence and diffuse reflectance spectroscopy for the diagnosis of breast cancer (March 2003)," IEEE Trans. Biomed. Eng. 50, 1233-1242 (2003). 8. G. M. Palmer, and N. Ramanujam, "Diagnosis of Breast Cancer Using Optical Spectroscopy," Medical Laser Application 18, 233-248 (2003). 9. P. K. Gupta, S. K. Majumder, and A. Uppal, "Breast cancer diagnosis using N2 laser excited autofluorescence spectroscopy," Lasers Surg. Med. 21, 417-422 (1997). 10. I. J. Bigio, S. G. Bown, G. Briggs, C. Kelley, S. Lakhani, D. Pickard, P. M. Ripley, I. G. Rose, and C. Saunders, "Diagnosis of breast cancer using elastic-scattering spectroscopy: preliminary clinical results," J. Biomed. Opt. 5, 221-228 (2000). #95011 $15.00 USD Received 14 Apr 2008; revised 6 Aug 2008; accepted 6 Aug 2008; published 9 Sep 2008 (C) 2008 OSA 15 September 2008 / Vol. 16, No. 19 / OPTICS EXPRESS 14961 11. I. J. Bigio, and J. R. Mourant, "Ultraviolet and visible spectroscopies for tissue diagnostics: fluorescence spectroscopy and elastic-scattering spectroscopy," Phys. Med. Biol. 42, 803-814 (1997). 12. C.-H. Liu, B. B. Das, W. L. Sha Glassman, G. C. Tang, K. M. Yoo, H. R. Zhu, D. L. Akins, S. S. Lubicz, J. Cleary, R. Prudente, E. J. Celmer, A. Caron, and R. R. Alfano, "Raman, fluorescence, and time-resolved light scattering as optical diagnostic techniques to separate diseased and normal biomedical media," J Photochem. Photobiol., B: Biol. 16, 187-209 (1992). 13. S. K. Majumder, P. K. Gupta, B. Jain, and A. Uppal, "UV excited autofluorescence spectroscopy of human breast tissues for discriminating cancerous tissue from benign tumor and normal tissue," Lasers in the Life Sciences 8, 249-264 (1999). 14. S. V. Pushkarev, S. A. Naumov, S. M. Vovk, V. A. Volovodenko, and V. V. Udut, "Application of laser fluorescence spectroscopy and diffuse reflection spectroscopy in diagnosing the states of mammary gland tissue," Optoelectronics-Instrumentation and Data Processing 2, 71-76 (1999). 15. R. R. Alfano, B. B. Das, J. Cleary, R. Prudente, and E. J. Celmer, "Light sheds light on cancer--distinguishing malignant tumors from benign tissues and tumors," Bull N. Y. Acad. Med. 67, 143-150 (1991). 16. Y. Yang, E. J. Celmer, J. A. Koutcher, and R. R. Alfano, "UV reflectance spectroscopy probes DNA and protein changes in human breast tissues," J. Clin. Laser Med. Surg. 19, 35-39 (2001). 17. V. G. Peters, D. R. Wyman, M. S. Patterson, and G. L. Frank, "Optical properties of normal and diseased human breast tissues in the visible and near infrared," Phys. Med. Biol. 35, 1317-1334 (1990). 18. N. Ghosh, S. K. Mohanty, S. K. Majumder, and P. K. Gupta, "Measurement of optical transport properties of normal and malignant human breast tissue," Applied Optics 40, 176-184 (2001). 19. G. M. Palmer, C. Zhu, T. M. Breslin, F. Xu, K. W. Gilchrist, and N. Ramanujam, "Monte Carlo-based inverse model for calculating tissue optical properties. Part II: Application to breast cancer diagnosis," Appl. Opt. 45, 1072-1078 (2006). 20. G. M. Palmer, and N. Ramanujam, "Monte Carlo based Model to Extract Intrinsic Fluorescence from Turbid Media: Theory and Phantom Validation," (2007). 21. H. Martens, Multivariate Calibration (John Wiley & Sons, New York, 1989). 22. G. M. Palmer, and N. Ramanujam, "Monte Carlo-based inverse model for calculating tissue optical properties. Part I: Theory and validation on synthetic phantoms," Appl. Opt. 45, 1062-1071 (2006). 23. R. Tauler, "Multivariate Curve Resolution, MCR-ALS Command Line Toolbox," (2006). 24. R. M. Bethea, B. S. Duran, and T. L. Boullion, Statistical methods for engineers and scientists (M. Dekker, New York, 1995). 25. C. Burges, "A Tutorial on Support Vector Machines for Pattern Recognition," Data Mining and Knowledge Discovery 2, 121-167 (1998). 26. N. Cristianini, and J. Shawe-Taylor, An Introduction to Support Vector Machines: and Other Kernel-based Learning Methods (Cambridge University Press, Cambridge, New York, 2000). 27. J. S. U. Hjorth, Computer Intensive Statistical Methods: Validation, Model Selection, and Bootstrap (Chapman & Hall, London, New York, 1994).

In most previous studies, empirical methods were used for the analysis of fluorescence [1,4,6,7,9,11,[13][14][15] and/or diffuse reflectance spectra [1,5,7,10,14,16] of breast tissues, in which the spectral intensities and/or line shapes were examined to extract spectral patterns (such as intensities, ratio of intensities, or principal components) that consistently discriminate between malignant and non-malignant tissues.Using the empirical analysis methods, malignant breast tissues could be distinguished from normal breast tissues based on the difference in the spectral intensities [1, 4-6, 9, 10, 15], or in the spectral line shapes [3,7] between malignant and normal breast tissues.Empirical methods of feature extraction can range from the extraction of simple ratios of fluorescence intensities [15] to chemometric techniques such as Principal Component Analysis [7].Although empirical analyses can reveal important spectroscopic features for tissue characterization and disease diagnosis, they do not relate the measured spectra directly to the physically meaningful information that they may represent.
Model based approaches are also increasingly being investigated for the analysis of tissue fluorescence and diffuse reflectance spectra [2,3,18,19].Using model based approaches, the intrinsic fluorescence can be recovered and the absorption and scattering properties can be quantified from the turbid tissue spectra.Several studies [3,18,19] have been reported in which the absorption and scattering properties were quantified from diffuse reflectance spectra to characterize breast tissues.In the two previous studies by our group [3,19], a Monte Carlo based inverse model was applied for the extraction of absorption and scattering properties, including chromophores concentrations and size and density of scatterers, from the diffuse reflectance spectra of malignant and non-malignant breast tissues.Both studies showed that malignant tissues have decreased hemoglobin saturation, decreased β-carotene concentration and increased mean reduced scattering coefficient compared to non-malignant tissues.A linear Support Vector Machine (SVM) classification based on these diagnostic optical properties yielded a sensitivity and specificity of 86% and 80% respectively in [3], and 82% and 92% respectively in [19].
Our group has expanded the Monte Carlo model to describe the fluorescence of biological tissue [20].In a recent study by Zhu, et al [2], a fluorescence model based on scalable Monte Carlo simulations [20] was applied to recover the intrinsic fluorescence of human breast tissues, from which the relative fluorescence contribution of individual fluorophores presented in breast tissues, including collagen, NADH and a third component attributed to retinol, were extracted.The study showed that the fluorescence contribution from collagen and NADH were higher, while that of the retinol was lower in malignant and fibrous/benign breast tissues as compared to that in normal adipose tissues.This study also evaluated the diagnostic values of fluorescence properties alone, absorption and scattering properties alone, and the combination of the fluorescence, absorption and scattering properties for the discrimination of breast malignancy.It was shown that using the intrinsic tissue properties, malignant breast tissues could be discriminated from non-malignant tissues with a sensitivity of 80-85% and a specificity of 85-90%.
There have been limited previous studies that systematically compare the performance of empirically based analyses to that of model-based approaches for the diagnosis of breast cancer.Our group has published independent studies that use either model-based approaches or empirical methods to extract features from diffuse reflectance and/or fluorescence spectra for discrimination between malignant and non-malignant breast tissues.These independent studies have been published on different patient cohorts.The goal of the study presented here is to do a systematic comparison of the classification based on principal component scores extracted using Partial Least Squares (empirical approach) vs. intrinsic tissue properties extracted from Monte Carlo inverse modeling (model based approach) on the same patient cohort.The actual classification scheme was based on a linear Support Vector Machine algorithm.The correlation between the PC scores and the intrinsic tissue properties were also evaluated in order to examine the relationship between the features extracted using the empirical and model based approaches.

Sample collection and data acquisition
The breast tissue optical spectroscopy study was approved by the Institutional Review Boards (IRB) at the University of Wisconsin -Madison.Breast tissue samples were obtained from patients undergoing either a lumpectomy or a mastectomy.Fluorescence and diffuse reflectance spectra were measured from freshly excised breast tissue samples soon after excision and accompanying histological diagnoses were obtained for each specimen from microscopic evaluation of hematoxylin and eosin (H&E) sections by a board certified pathologist (JH).A total of 83 tissue samples, including 37 malignant, 2 benign, 6 fibrous and 38 adipose samples have been included in this study for data analysis.This sample set is a subset of the tissue samples that have been reported in a previous publication [2], where only model-based analysis has been performed.Table 1 shows (a) the histological break down of the breast samples investigated in this study, and (b) the distribution of percent malignancy in malignant tissue samples.In this study, tissue fluorescence and diffuse reflectance were measured with a fiber optic probe coupled to a multi-wavelength optical spectrometer.The fiber optic probe and spectrometer have been described in detail in our earlier publications [1,3].Tissue fluorescence emission was measured at excitation wavelengths ranging from 300 -460 nm, and at each excitation wavelength, over a spectral range of 260 nm with the first emission wavelength red-shifted by 20 nm from the excitation.Tissue diffuse reflectance was measured over a wavelength range of 350 -600 nm.Both measured fluorescence and diffuse reflectance spectra were calibrated in order to correct for the (1) background spectrum, (2) wavelength dependence, and (3) throughput of the system.Details about the instrument settings and calibrations for the fluorescence and diffuse reflectance measurements can be found in earlier publications [1,3].

Partial Least Squares (PLS) analysis of tissue spectra
Partial Least Squares (PLS) analysis [21] was used for the empirical spectral analysis of the tissue fluorescence and diffuse reflectance spectra.Data obtained in the clinical study can be divided into "independent" (or "predictor") and "dependent" (or "response") variables.The "independent" variables are in the spectral data matrix X, and the "dependent" variable Y is a binary variable that represents the histological diagnosis of each sample, with "1" for malignant tissues and "0" for non-malignant (benign and normal) tissues.PLS is a regression method, which intends to find a linear model to build the relationship between a dependent (response) variable Y and a set of independent (predictor) variables X.The general idea of PLS is to try to extract a set of principal components (PCs), which account for as much of the variance in X as possible and also are relevant to the Y variable.Details about the PLS analysis can be found in references [21] and also in our previous publications [1,3].
Prior to PLS analysis, each fluorescence spectrum was normalized to the integrated spectral intensity over the entire spectrum.This pre-processing removed inter-patient variations and possible intra-patient variations due to variations in probe-tissue contact.Each diffuse reflectance spectrum was normalized point-by-point to a reference phantom spectrum, which is also employed in the Monte Carlo model based approach for pre-processing [3].This normalization method was chosen for the diffuse reflectance spectra so that the empirical and model based approaches could be compared on spectra that were pre-processed the same way.PLS analysis was then carried out on the normalized fluorescence spectra measured at each of the excitation wavelengths within 340 -400 nm, one at a time, and separately on the normalized diffuse reflectance spectra.The analyzed spectra were restricted to this excitation wavelength range, because this is the range in which the primary fluorophores in breast tissue were characterized using the Monte Carlo model based analysis.The spectra measured at 300 and 320 nm were excluded because the optical properties at these wavelengths were not available thus the model based analysis of fluorescence spectra could not be applied.The spectra measured at 420 -460 nm were excluded because of poor signal to noise.The extracted PCs that account for a total of 95% or greater variances in the spectral data were retained for further analysis.

Monte Carlo model based analysis of tissue spectra
A Monte Carlo inverse model of diffuse reflectance was used to extract the absorption and scattering properties from the measured diffuse reflectance spectra.Details about the model of diffuse reflectance have been described in published references [22] and the parameter settings for the inverse modeling procedure can be found in our earlier publications [2].Briefly in this model, the primary intrinsic absorbers in breast tissues over the UV-VIS spectrum were assumed to be oxygenated and deoxygenated hemoglobin and beta-carotene, and the scatterer was assumed to be single sized, spherically shaped and uniformly distributed.In the forward modeling process, the tissue absorption coefficients were calculated from concentrations of absorbers and their corresponding extinction coefficients, and the scattering coefficients were calculated from scatterer size and density using Mie theory.The calculated absorption and scattering coefficients were then used in Monte Carlo simulations to obtain a modeled diffuse reflectance.In the inverse process, the modeled diffuse reflectance was adaptively fitted to the measured tissue diffuse reflectance by adjusting the input concentration of absorbers and the size and density of scatterers.The tissue parameters extracted from the inverse model were the optimal input values obtained upon convergence of the optimization procedure.The fitting was repeated twenty times each time with different initial inputs.The set of fitted tissue parameters among the twenty, which were associated with the smallest mean squared error, were chosen as the final extracted tissue parameters.The absorption properties extracted from the model and used in further data analysis included β-carotene concentration, total hemoglobin concentration and hemoglobin saturation, and the latter two were calculated directly from the concentrations of oxygenated and deoxygenated hemoglobin.The mean reduced scattering coefficient was calculated from the size and density of scatterer and used to describe the bulk scattering properties of tissue.
For fluorescence, a Monte Carlo model of fluorescence [20] was employed for the extraction of intrinsic fluorescence spectra of breast tissues.This model is capable of removing the distorting effects of absorption and scattering and is valid for a wide range of optical properties.The Monte Carlo model simulates photon propagation (including both incident photons and fluorescence re-emission photons) within a turbid medium, given a set of known absorption and scattering coefficients and illumination-collection geometry.In our approach, the set of absorption and scattering coefficients of tissue were derived from the concomitantly measured diffuse reflectance spectrum using the Monte Carlo based inverse model of diffuse reflectance, which is described above.Next, these optical properties, as well as the collection efficiency defined by the specific probe geometry, were used in the Monte Carlo simulation of fluorescence to generate a correction curve that accounts for the effects of absorption and scattering on the fluorescence emission and the effect of the probe geometry on fluorescence measurements.Intrinsic fluorescence spectra of tissues were obtained by dividing the measured fluorescence spectra point-by-point by the correction curve.This fluorescence model has been described in detail elsewhere [20].A Multivariate Curve Resolution (MCR) method [23] was then used to extract the relative fluorescence contribution of several contributing fluorescent components, including collagen, NADH and a third component, which we tentatively assigned to retinol from the intrinsic fluorescence EEM.Each intrinsic fluorescence EEM was normalized by dividing each intensity-wavelength point by the integrated intensity over the entire EEM prior to MCR analysis.Details about the extraction of intrinsic fluorescence of breast tissues using the model based approach and that of the MCR analysis can be found in reference [2].

Statistical Analysis and Classification
A Wilcoxon rank sum test [24] was performed separately on PC scores and the extracted tissue properties, to identify which ones show statistically significant contrast between (a) malignant vs. fibrous/benign, (b) malignant vs. adipose, and (c) malignant vs. non-malignant breast tissue samples.For discriminating malignant from non-malignant samples, the variables (extracted PC scores or tissue properties) that displayed statistically significant differences between the two tissue types were used for tissue classification.A linear Support Vector Machine (SVM) [25,26] classifier was used in this study to classify each tissue sample as malignant or non-malignant.The performance of SVM classifier was evaluated using a leave-one-out cross validation scheme [27].A hold out validation scheme has also been tested, which yielded similar classification results to the leave-one-out validation (results not shown here).Classification was carried out separately using either the scores of diagnostically significant PCs or the extracted tissue properties as inputs.In the case where the diagnostically significant PCs were used for classification, only a selected subset of diagnostically significant PCs were used as the input to the classifier, because the scores of the PCs that discriminated between malignant and non-malignant tissues may be correlated.To remove data redundancy, only the PC scores that were diagnostically most significant, as well as uncorrelated were input to the classification algorithms to diagnose breast malignancy.Two sets of PC scores were considered uncorrelated if p > 0.01 for the Pearson correlation.
In addition to the analyses on each individual set of data (PC scores or the extracted tissue properties), a cross correlation analysis was also performed to examine the relationship between fluorescence PCs and the extracted fluorophore properties, and that between reflectance PCs and the absorption and scattering properties.Pearson correlation was used to evaluate the correlation, and a correlation was considered significant if p < 0.01.

Results from empirical spectral analysis
Table 2 shows the PCs identified from Wilcoxon rank-sum test as displaying statistically significant differences between malignant and other non-malignant breast tissues (at significance level of p < 0.05).The variance that each PC accounts for is also listed in the table.PC2 obtained from the diffuse reflectance spectra (reflectance PC2) was the only PC that displayed statistically significant difference (p< 0.01) between malignant and fibrous/benign breast tissues, and none of the PCs obtained from fluorescence spectra showed statistically significant difference between these two tissue types.The set of PCs that displayed statistically significant differences between malignant and adipose tissue was the same set that displayed statistically significant differences between malignant and nonmalignant breast tissues.The PC1s obtained from fluorescence spectra at each excitation wavelength (especially those obtained at excitation of 340, 360 and 380 nm), as well as PC1 obtained from diffuse reflectance spectra displayed the statistically most significant contrast between malignant and non-malignant breast tissues (with p < 1e-4 at least).These PCs also accounted for a majority of the variance contained in the spectral data.Table 3 shows the pair wise linear correlation coefficients between the scores of PCs that were listed in Table 2 as displaying statistically significant differences between malignant and non-malignant tissue samples.The pair of PCs whose scores were significantly correlated, i.e. p < 0.01, were marked with *.Each PC is denoted with the corresponding excitation wavelength (or a string "Refl" in the case of diffuse reflectance) followed by the PC#.Table 3. Pair wise linear correlation coefficients between the scores of PCs that were listed in Table 2 as displaying statistically significant difference between malignant and non-malignant tissue samples.The pair of PCs whose scores were significantly correlated, i.e. p < 0.01, were marked with asterisk*.The highlighted PCs (340nm PC1 and 360nm PC1) were selected for tissue classification.Amongst the fluorescence PCs, 340nm PC1 was the one that displayed statistically the most significant difference between malignant and non-malignant tissue samples (p < 1e-9), and this excitation wavelength can excite three potential fluorophores including collagen, NADH and retinol.This PC was thus selected as the first key fluorescence variable for the classification between malignant and non-malignant tissue samples.Other fluorescence PCs used in classification were chosen as the ones that not only displayed the statistically most significant differences between malignant and non-malignant samples, but also were uncorrelated with 340nm PC1.The latter criterion of non-correlation was applied in order to reduce the number of inputs to the classifier, and also reduce the redundancy in fluorescence spectral data that was used for the diagnosis of breast malignancy.Based on these criteria, the other fluorescence PC selected for tissue classification was the 360nm PC2, which displayed statistically significant difference between malignant and non-malignant breast tissues at p < 1e-4.Table 4. Results from the leave-one-out cross validation of a linear SVM classification for discriminating malignant and non-malignant breast tissue samples using (1) two fluorescence PCs that are diagnostically the most significant and uncorrelated (i.e.340nm PC1 and 360nm PC2); (2) two reflectance PCs that are diagnostically most significant (Refl PC1 and Refl PC2); and (3) the combined PCs of ( 1) and ( 2).
( Table 4 shows the results from the leave-one-out cross validation of a linear SVM classification for discriminating malignant and non-malignant breast tissue samples using (1) two fluorescence PCs that are diagnostically the most significant and uncorrelated (i.e.340nm PC1 and 360nm PC2); (2) two reflectance PCs that are diagnostically most significant (Refl PC1 and Refl PC2); and (3) the combined PCs of (1) and (2).The SVM classification using diagnostic fluorescence PCs alone achieved a cross-validated sensitivity and specificity of 83.8% and 87.0%, respectively, and that using diagnostic reflectance PCs alone achieved a cross-validated sensitivity and specificity of 86.5% and 89.1% respectively for the discrimination between malignant and non-malignant tissue samples.When both set of PCs were combined for the classification, malignant samples were discriminated from nonmalignant samples with a sensitivity of 86.5% and a specificity of 89.1%.

Results from Monte Carlo model based analysis
Table 5 shows the results from Wilcoxon rank-sum test on extracted absorption, scattering and fluorescence properties, which showed statistically significant differences (at least p < 0.05) between (1) malignant and fibrous/benign; (2) malignant and adipose, and (3) malignant and non-malignant breast tissues (fluorescence properties were marked with *).Only the βcarotene concentration and the hemoglobin saturation displayed statistically significant differences between malignant and fibrous/benign breast tissues.All properties, including absorption, scattering and fluorescence, displayed statistically significant differences between malignant and adipose, as well as between malignant and non-malignant tissue types.
Table 6 shows the results from the leave-one-out cross validation of a linear SVM classification for discriminating malignant from non-malignant breast tissue samples using (1) fluorescence properties only (relative fluorescence contribution of collagen, NADH and retinol), (2) absorption and scattering properties only (mean μ s ', β-carotene concentration and hemoglobin saturation), and (3) combination of fluorescence, absorption and scattering properties in (1) and (2).The total hemoglobin concentration was not included for the classification because the cross validation results showed that including this property did not improve the classification accuracy.Using the fluorescence properties alone, the malignant tissue samples were discriminated from non-malignant tissue samples with a cross-validated sensitivity and specificity of 86.5% and 80.4%, respectively.Using absorption and scattering properties alone, the SVM classification discriminated between malignant and non-malignant tissue samples with a cross-validated sensitivity and specificity of 86.5% and 89.1%, respectively.Combining all fluorescence, absorption and scattering properties for the classification yield a sensitivity and specificity of 89.2% and 89.1% for classifying malignant from non-malignant tissue samples.7 lists the fraction of misclassified samples in each tissue category.The sample number of each misclassified sample is listed in the parentheses.Malignant samples were broken down by percentage of malignancy and non-malignant samples were broken down by tissue type.In all classifications, 2~3 out of 9 samples with 0~25% malignancy, up to 2 out of 11 samples with 25~50%, and up to 2 out of 12 samples with 50~75% malignancy were misclassified in cross-validation.Only 1 out of 5 samples with 75% or more malignancy was misclassified when fluorescence PCs or fluorescence properties were used for classification.None was misclassified when the reflectance variables were used either alone or in combination with the fluorescence variables.For non-malignant samples, 3~6 out of 8 fibrous/benign samples were misclassified, while only 1~3 out of 38 adipose samples were misclassified in cross validation.More fibrous/benign samples were misclassified when the fluorescence variables alone were used.

Relationship between PCs and tissue properties
Table 8 shows (a) the correlation between fluorescence PCs and the extracted fluorophore properties; and (b) the correlation between reflectance PCs and the absorption and scattering properties.Correlation was considered significant if p < 0.01 and the correlation coefficient is shown in the table, otherwise a symbol Ø is shown indicating no significant correlation.The correlation coefficients marked with ** have a p-value of p < 1e-6.The PCs shown here are the ones that accounted for the vast majority of variance in the spectral data, i.e.PC1 or PC2.Other PCs that only accounted for a small portion of data variance (mostly < 5%) were not evaluated for their correlation with tissue properties.Table 8(a) shows that 340nm PC1, 360nm PC1 and 380nm PC1 are most significantly correlated with the fluorescence contribution of collagen and retinol (p < 1e-6).The correlation with collagen fluorescence was positive, while that with the retinol fluorescence was negative.These PCs also displayed correlation with the fluorescence contribution of NADH, however the correlation was not as significant as that with the fluorescence contribution of collagen and retinol.The 340nm PC2, 360nm PC2 and 380nm PC2 were most significantly correlated with the fluorescence contribution of NADH (p < 1e-6).The 400 nm PC1 displayed significant positive correlation with the fluorescence contribution of NADH while it was negatively correlated with the fluorescence contribution of retinol.The 400 nm PC2 had significant positive correlation with fluorescence contribution of collagen and significant negative correlation with the fluorescence contribution of retinol.Table 8(b) shows that the PC1 obtained from diffuse reflectance (Refl PC1) was most significantly correlated with the extracted mean reduced scattering coefficients (mean μ s ') and the β-carotene concentration (p < 1e-6).It had a positive correlation with mean μ s ' and a negative correlation with β-carotene concentration.PC1 also displayed negative correlation with hemoglobin saturation %HbO2 (p < 0.01), however the correlation is not as significant as that with mean μ s ' and β-carotene concentration.The Refl PC2 displayed significant positive correlation with β-carotene concentration.Figure 1 shows (a) average fluorescence spectra of malignant (n = 37) and non-malignant (n = 46) tissue samples at the excitation wavelength of 340 nm, (b) 340nm PC1 and 360nm PC2 obtained from the fluorescence spectra, (c) the correlation of 340nm PC1 with the extracted fluorescence contribution of collagen and retinol, and (d) the correlation of 360nm PC2 with the extracted fluorescence contribution of NADH.Fig. 1(a) shows fluorescence spectra of malignant tissues displayed two peaks at 390 nm and 450 nm respectively, and nonmalignant tissue fluorescence displayed a shoulder within 460-520 nm.Fluorescence intensity in malignant tissues is higher within wavelength range of 360-460 nm while lower within 470-620 nm, as compare to that in non-malignant tissues.In Fig. 1(b), the 340nm PC1, which accounts for 83.1% of data variance, displayed positive values within wavelength range of 360 -460 nm, and negative values over the wavelengths above 460 nm.The 360nm PC2, which accounts for 15.6% of data variance, displayed a shoulder with positive values between 430 -510 nm, and negative values over wavelengths below 430 nm and above 510 nm.In Fig. 1(c), 340nm PC1 is significantly and positively correlated with the relative fluorescence contribution of collagen, while negatively correlated with the relative fluorescence contribution of retinol.In Fig. 1(d

Comparing malignant vs. fibrous tissues using balanced sample sets
The results reported above were obtained using tissue samples obtained from breast cancer surgery, which had unbalanced sample size for malignant and fibrous/benign tissues (i.e.37 vs. 8).The sample size for fibrous/benign tissues may not be large enough to test the statistical difference between malignant and fibrous/benign tissues.To address this concern, we incorporated another set of normal fibrous tissue samples (a total of 22) obtained from patients undergoing breast reduction surgery as inputs into the model-based analysis approach.This set of tissue samples was measured following the same tissue collection, handling, measurement and histopathology protocols, and has been reported in reference [2].The fluorescence and diffuse reflectance spectra of these normal fibrous tissue samples were processed and analyzed using the model based approach in the same way as that described in the Methods section.The PLS analysis was not conducted on this data set as this method requires that the additional data set be pooled together with the existing data set for the extraction of principal components.Incorporating the 22 normal fibrous tissue samples obtained from breast reduction increased the total number of fibrous/benign samples to 30, which is comparable to the sample size of the malignant tissues.Wilcoxon rank-sum tests were performed on the combined sample sets to identify which of the extracted tissue properties showed the statistically most significant differences between malignant and fibrous/benign breast tissues.A linear SVM classification was then carried out to test the accuracy of using the diagnostically significant tissue properties for discriminating malignant from fibrous/benign breast tissues.
Table 9 shows the results from Wilcoxon rank-sum test on the extracted tissue properties for the statistically significant difference between malignant and fibrous/benign breast tissues when breast reduction samples were combined into the sample set.The mean reduced scattering coefficient (μ s ', p < 0.05), β-carotene concentration (p < 0.05), hemoglobin saturation (p < 1e-5), and the fluorescence contribution from collagen (p < 0.05) and retinol (p < 0.01) showed statistically significant differences between malignant and fibrous/benign breast tissues.
Table 9. Results from Wilcoxon rank-sum test on the extracted tissue properties for the statistically significant difference between malignant and fibrous/benign breast tissues when breast reduction samples were combined into the sample set.Table 10 shows the diagnostic accuracy of a linear SVM classification for discriminating malignant from fibrous/benign breast tissues, using (1) diagnostically significant fluorescence properties only, (2) diagnostically significant absorption and scattering properties only, and (3) the combined tissue properties.Using diagnostically significant fluorescence properties (collagen and retinol fluorescence), malignant and fibrous/benign tissues can be discriminated with a cross-validated sensitivity of 51.4% and a specificity of 50%.Using diagnostically significant absorption and scattering properties (mean reduced scattering coefficient μ s ', βcarotene concentration, and hemoglobin saturation), malignant can be discriminated from fibrous/benign breast tissues with a cross-validated sensitivity of 59.5% and specificity of 86.7%.Combining all diagnostically significant tissue properties yielded the same classification accuracy as that obtained using absorption and scattering properties only.Nonlinear SVM classifier with a radial basis function kernel has also been developed however it did not improve the classification accuracy thus the results are not reported here.

Discussion
In this paper we have compared an empirical approach and a Monte Carlo model based approach for the analysis of tissue fluorescence and diffuse reflectance spectra and for the diagnosis of breast cancer.Both approaches can extract diagnostically useful features from the fluorescence and diffuse reflectance spectra, thus significantly reducing the large number of spectral variables into a few features that can be used for the discrimination of breast cancer.
The classification using the features extracted from both approaches (i.e.PCs or tissue properties) were comparable, which suggests that both approaches are equally effective for the discrimination of breast malignancy.
In most classifications, the percentage of misclassified samples increased with decreasing percentage of malignancy as would be expected.The misclassified samples with 0~25% of malignancy were either invasive lobular cancer or carcinoma in situ, which account for only a small portion of malignant samples.These sub-types of malignancy may be underrepresented due to the small number of tissue samples as well as the low percentage of malignancy presented in these tissue samples.
In the empirical approach, PLS was employed to extract a set of components, which represent the tissue spectra with dramatically reduced dimension.This particular method, rather than Principal Component Analysis (PCA) that is widely used for data reduction was employed because in a previous study [1] it was shown that PLS has the advantage of taking into account the histological diagnosis of tissue samples when extracting the principal components.Only one PC obtained from the diffuse reflectance displayed statistically significant differences between malignant and fibrous/benign breast tissues, while PCs obtained from fluorescence spectra primarily displayed statistically significant differences between malignant and adipose tissues.
Classifications based on the diagnostically useful PCs provided a sensitivity and specificity of up to 87% and 89% for the discrimination between malignant and nonmalignant breast tissues.It was noted that classification using combined fluorescence and reflectance PCs yielded similar sensitivity and specificity as that using reflectance PCs only for the discrimination between malignant and non-malignant tissue samples.Although in this study adding fluorescence does not seem to increase the diagnostic accuracy, fluorescence does provide diagnostically useful information, because four out of the ten misclassified samples using reflectance PCs alone were correctly classified using fluorescence PCs.The rest six samples have been consistently misclassified using either fluorescence PCs, reflectance PCs or the combination of both (see Table 7).Combining the fluorescence and reflectance PCs directly in a SVM classification may not make full use of the complementary information that each type of spectra may contain (since a separation hyperplane in a higher dimensional space is not a direct combination of the hyperplane in its sub-space of lower dimension).However a strategy using fluorescence PCs and reflectance PCs separately (e.g., sequentially) may have the potential to improve the overall classification.
All the extracted tissue properties displayed statistically significant difference between malignant and adipose, and between malignant and non-malignant breast tissues (p < 0.05).For the tissue samples obtained from cancer surgery, only hemoglobin saturation and the mean reduced scattering coefficient extracted from the diffuse reflectance spectra showed statistically significant differences between malignant and fibrous/benign tissues.However, only 8 fibrous/benign samples were available for this analysis.Incorporating the normal fibrous samples obtained from breast reduction surgery increased the sample size such that it is comparable to that of malignant tissue samples.Results from the Wilcoxon rank-sum test (Table 9) indicated that most of the extracted tissue properties, except for the total hemoglobin concentration and the fluorescence contribution of NADH, showed statistically significant differences (p < 0.05 at least) between malignant and fibrous/benign tissues.Classification between malignant vs. fibrous/benign tissue samples (as shown in Table 10) using diagnostically significant absorption and scattering properties yielded higher sensitivity and specificity as compared to that using diagnostically significant fluorescence properties only.Combining the two sets of tissue properties did not improve the diagnostic accuracy, suggesting the diagnosis is primarily attributed to the difference in absorption and scattering properties (especially the hemoglobin saturation).
The tissue samples used in this study are a subset of the tissue samples used in a previous study, which we have reported in ref [2].In the previous study, two sets of tissue samples obtained from two independent breast studies (including the one used in this study) were combined for the discrimination analysis.Most results shown here for the statistically significant differences between malignant and non-malignant breast tissues are consistent with those obtained from the combined sample set investigated in the previous study, with the exception that the total hemoglobin concentration displayed a significant difference (p < 0.05) between malignant and non-malignant breast tissues for this sample set, while this was not observed for the combined sample set in previous study.This demonstrates that there is consistency between the results analyzed with the Monte Carlo based inverse models of data collected from different instruments and probes.
The PCs extracted from the PLS analysis can be correlated with the tissue properties extracted from the model based analysis.For fluorescence, the 340nm PC1 for example, had significant positive correlation with the fluorescence contribution of collagen, and significant negative correlation with the fluorescence contribution of retinol.As shown in Fig. 1(b), 340nm PC1 displayed positive values within wavelength range of 360 -460 nm with maximum appearing at around 390 nm, and negative values over the wavelengths above 460 nm, with minimum appearing at around 520 nm.Since the PCs primarily account for the difference in spectral line shape observed between malignant and non-malignant tissue samples, this may suggest that the fluorescence intensity over wavelength range of 360 -460 nm was higher, while the fluorescence intensity over wavelengths above 460 nm was lower for malignant than that for non-malignant tissues.The spectral features over 360 -460 nm characterize the collagen fluorescence, and a larger PC score may indicate higher collagen fluorescence.The spectral features over the wavelengths above 460 nm characterize retinol fluorescence, however a larger PC score may indicate lower retinol fluorescence, as the magnitude of the PC over this wavelength range is negative.This explains the positive correlation of 340nm PC1 with the fluorescence contribution of collagen, and the negative correlation with the fluorescence contribution of retinol.The 360nm PC2s were most significantly correlated with the fluorescence contribution of NADH.Fig. 1 (b) showed that 360nm PC2 had a shoulder over the wavelengths between 430 -510 nm, which coincide with the fluorescence emission maxima of NADH.The larger PC scores suggest higher NADH fluorescence, thus a positive correlation between the PC score and the fluorescence contribution of NADH is expected.
For reflectance, the Refl PC1 was most significantly correlated with the mean reduced scattering coefficient.This PC has positive values over the entire spectrum, thus a larger PC score suggests a higher over all spectral intensity.Increased scattering in the medium will also result in a diffuse reflectance spectrum of higher intensity.Thus an increase in the score of Refl PC1 may reflect the increase in the spectral intensity that can partially result from the increasing mean reduced scattering coefficient.The negative correlation observed between Refl PC1 and β-carotene concentration may be attributed to the negative correlation between β-carotene concentration and mean reduced scattering coefficient, since the former increases while the latter decreases with increasing adipose tissue content.Refl PC2 displayed significant positive correlation with β-carotene concentration.This PC displayed an apparent valley over the wavelength range of 430 -520 nm, which coincides with the absorption band of β-carotene.The larger the PC score is, the deeper the valley would be, suggesting more absorption by β-carotene.
In summary, the PC1s of fluorescence spectra measured at 340, 360 and 380 nm primarily reflect the fluorescence from collagen and retinol, while the PC2s primarily reflect the NADH fluorescence.The PC1 of diffuse reflectance spectra is most related to the scattering property, while the PC2 is primarily related to β-carotene concentration and hemoglobin saturation.For the sample set investigated in this study, the classification based on PCs and that based on intrinsic tissue properties provided comparable classification accuracy.This suggests that both the linear (PLS) and non-linear (Monte Carlo) methods extract similar features from the tissue spectra for the diagnosis of breast cancer and that one method is not superior to the other in this respect.
Each approach has its advantages and disadvantages.The one advantage of the empirical approach is that it is not computationally intensive, which is the disadvantage of the model based approach, as the latter involves a recursive procedure for model optimization and each tissue spectrum has to be processed individually, while the former only involves linear regression and the spectra of all samples are pooled together for processing.One disadvantage of the empirical approach is that a finite number of tissue spectra from both tissue types are needed to extract the principal components, and the change in sample pool (e.g., exclusion of part of samples or inclusion of new samples) will result in a different set of principal components.In this study, the empirical analysis was not performed for the discrimination between malignant vs. fibrous/benign samples after the inclusion of additional tissue samples from the breast reduction surgery, because the PLS analysis on the new sample set will produce a set of PCs that are different from the ones that have been presented earlier (i.e., those extracted from the cancer surgery samples), thus making it difficult to relate and compare the new set of PCs with the other results presented in this study.This however, is not a problem with the model based approach, since the model-based feature extraction is performed on each individual sample thus adding new samples for further analysis is straightforward.
In conclusion, we have presented in this study the use of both an empirical and a Monte Carlo model based approach for the analysis of tissue fluorescence and diffuse reflectance spectra, and demonstrated that classification based on both approaches provided comparable classification accuracy for discriminating breast malignancy.We also showed that there are significant correlations between the PCs extracted from the empirical spectral analysis and the intrinsic tissue properties extracted from the model based analysis, suggesting both approaches may probe the same spectroscopic contrast in the tissue that discriminate between malignant and non-malignant breast tissues albeit in different ways.While the empirical spectral analysis provides a straightforward means to examine the difference in the fluorescence and diffuse reflectance spectra of various tissue types, the model based analysis allows for a quantitative assessment of the physiological and biochemical information about the tissue property, thus providing insights into the biological basis of the spectral features that are observed in the tissue spectra.

Fig. 1 .
Figure1shows (a) average fluorescence spectra of malignant (n = 37) and non-malignant (n = 46) tissue samples at the excitation wavelength of 340 nm, (b) 340nm PC1 and 360nm PC2 obtained from the fluorescence spectra, (c) the correlation of 340nm PC1 with the extracted fluorescence contribution of collagen and retinol, and (d) the correlation of 360nm PC2 with the extracted fluorescence contribution of NADH.Fig.1(a)shows fluorescence spectra of malignant tissues displayed two peaks at 390 nm and 450 nm respectively, and nonmalignant tissue fluorescence displayed a shoulder within 460-520 nm.Fluorescence intensity in malignant tissues is higher within wavelength range of 360-460 nm while lower within 470-620 nm, as compare to that in non-malignant tissues.In Fig.1(b), the 340nm PC1, which accounts for 83.1% of data variance, displayed positive values within wavelength range of 360 -460 nm, and negative values over the wavelengths above 460 nm.The 360nm PC2, which accounts for 15.6% of data variance, displayed a shoulder with positive values between 430 -510 nm, and negative values over wavelengths below 430 nm and above 510 nm.In Fig.1(c), 340nm PC1 is significantly and positively correlated with the relative fluorescence contribution of collagen, while negatively correlated with the relative fluorescence contribution of retinol.In Fig.1(d) the 360nm PC2 displayed significant positive correlation with NADH fluorescence.

Figure 2
Figure 2 shows (a) average calibrated diffuse reflectance spectra of malignant (n = 37) and non-malignant (n = 46) tissue samples, (b) PC1 and PC2 obtained from the diffuse reflectance spectra (Refl PC1 and Refl PC2), (c) the correlation of Refl PC1 with mean reduced scattering coefficients μ s ', (d) correlation of Refl PC1 with β-carotene concentration, (e) correlation of Refl PC1 with hemoglobin saturation, and (f) the correlation of Refl PC2 with β-carotene concentration.In Fig. 2(b), Refl PC1 has positive values over the entire spectrum, and with a dip appearing at the wavelength around 435 nm.The spectral line shape of Refl PC2 featured an apparent valley with negative values over the wavelength range of 430 -520 nm.It was shown that Refl PC1 was positively correlated with the mean reduced scattering coefficients (Fig.2(c)), while negatively correlated with the extracted β-carotene concentration (Fig.2(d)).Refl PC1 also displayed negative correlation with hemoglobin saturation as shown in Fig.3(e).Fig.2(f) shows that the Refl PC2 had positive correlation with β-carotene concentration.

Table 1 .
(a) Histological breakdown of the breast samples investigated in this study; and (b) distribution of percent malignancy in malignant samples

Table 2 .
PCs identified from Wilcoxon rank-sum test as displaying statistically significant difference between malignant and other non-malignant breast tissues (at significance level of p < 0.05).The variance that each PC accounts for was also listed in the table.

Table 5 .
Results from Wilcoxon rank-sum test on extracted absorption, scattering and fluorescence properties for the statistical significant difference (at least p < 0.05) between (1) malignant and fibrous/benign; (2) malignant and adipose, and (3) malignant and non-malignant breast tissues (fluorescence properties were marked with *).

Table 6 .
Results from the leave-one-out cross validation of a linear SVM classification for discriminating malignant from non-malignant breast tissue samples using (1) fluorescence properties only (relative fluorescence contribution of collagen, NADH and retinol), (2) absorption and scattering properties only (mean μ s ', β-carotene concentration and hemoglobin saturation), and (3) combination of fluorescence, absorption and scattering properties in (1) and (2).

Table 7 .
Fraction of misclassified samples of each tissue category.The sample number of each misclassified sample is listed in the parentheses.Malignant samples were broken down by percentage of malignancy and non-malignant samples were broken down by tissue type.

Table 8 .
(a)Correlation between fluorescence PCs and the extracted fluorophore properties; and (b) Correlation between reflectance PCs and the absorption and scattering properties.Correlation was considered significant if p < 0.01 and the correlation coefficient was shown in the table, otherwise a symbol Ø was shown indicating no significant correlation.The correlation coefficients marked with ** have a p-value of p < 1e-6.

Table 10 .
Diagnostic accuracy of a linear SVM classification for discriminating malignant from fibrous/benign breast tissues, using (1) diagnostically significant fluorescence properties only, (2) diagnostically significant absorption and scattering properties only, and (3) the combined tissue properties.