Intra-class variability in diffuse reflectance spectroscopy : application to porcine adipose tissue

Optical diffuse reflectance spectroscopy (DRS) has great potential in the study, diagnosis, and discrimination of biological tissues. Discrimination is based on massive measurements that conform training sets. These sets are then used to classify tissues according to the biomedical application. Classification accuracy depends strongly on the training dataset, which typically comes from different samples of the same class, and from different points of the same sample. The variability of these measurements is not usually considered and is assumed to be purely random, although it could greatly influence the results. In this work, spectral variations within and between samples of different animals of ex-vivo porcine adipose tissue are evaluated. Algorithms for normalization, dimensionality reduction by principal component analysis, and variability control are applied. The PC analysis shows the dataset variability, even when a variability removal algorithm is applied. The projected data appear grouped by animal in the PC space. Mahalanobis distance is calculated for every group, and an ANOVA test is performed in order to estimate the variability. The results confirm that the variability is not random and is dependent at least on the anatomical location and the specific animal. The variability magnitude is significant, particularly if the classification accuracy is needed to be high. As a consequence, it should be taken generally into account in classification problems. © 2018 Optical Society of America under the terms of the OSA Open Access Publishing Agreement OCIS codes: (170.6510) Spectroscopy, tissue diagnostics; (170.6935) Tissue characterization; (170.4580) Optical diagnostics for medicine. References and links 1. T. Vo-Dinh, ed., Biomedical Photonics Handbook: Biomedical Diagnostics (CRC Press, 2014). 2. M. E. Brezinski, G. J. Tearney, B. Bouma, S. A. Boppart, C. Pitris, J. F. Southern, and J. G. Fujimoto, “Optical biopsy with optical coherence tomography,” Ann. N. Y. Acad. Sci. 838(1), 68–74 (1998). 3. F. Fanjul-Vélez, M. Pircher, B. Baumann, E. Götzinger, C. K. Hitzenberger, and J. L. Arce-Diego, “Polarimetric analysis of the human cornea measured by Polarization-Sensitive Optical Coherence Tomography,” J. Biomed. Opt. 15(5), 056004 (2010). 4. A. García-Uribe, N. Kehtarnavaz, G. Marquez, V. Prieto, M. Duvic, and L. V. Wang, “Skin cancer detection by spectroscopic oblique-incidence reflectometry: classification and physiological origins,” Appl. Opt. 43(13), 2643–2650 (2004). 5. I. Salas-García, F. Fanjul-Vélez, and J. L. Arce-Diego, “Superficial radially-resolved fluorescence and threedimensional photochemical time-dependent model for Photodynamic Therapy,” Opt. Lett. 39, 1845–1848 (2014). 6. N. Ortega-Quijano, F. Fanjul-Vélez, J. de Cos-Pérez, and J. L. Arce-Diego, “Analysis of the depolarizing properties of normal and adenomatous polyps in colon mucosa for the early diagnosis of precancerous lesions,” Opt. Commun. 284, 4852–4856 (2011). 7. F. Koenig, R. Larne, H. Enquist, F. J. McGovern, K. T. Schomacker, N. Kollias, and T. F. Deutsch, “Spectroscopic measurement of diffuse reflectance for enhanced detection of bladder carcinoma,” Urology 51(2), 342–345 (1998). 8. G. Zonios, L. T. Perelman, V. Backman, R. Manoharan, M. Fitzmaurice, J. Van Dam, and M. S. Feld, “Diffuse reflectance spectroscopy of human adenomatous colon polyps in vivo,” Appl. Opt. 38(31), 6628–6637 (1999). 9. G. Bale, S. Mitra, J. Meek, N. Robertson, and I. Tachtsidis, “A new broadband near-infrared spectroscopy system for in-vivo measurements of cerebral cytochrome-c-oxidase changes in neonatal brain injury,” Biomed. Opt. Express 5(10), 3450–3466 (2014). Vol. 9, No. 5 | 1 May 2018 | BIOMEDICAL OPTICS EXPRESS 2297 #325990 https://doi.org/10.1364/BOE.9.002297 Journal © 2018 Received 13 Mar 2018; revised 5 Apr 2018; accepted 9 Apr 2018; published 20 Apr 2018 10. F. Stelzle, K. Tangermann-Gerk, W. Adler, A. Zam, M. Schmidt, A. Douplik, and E. Nkenke, “Diffuse reflectance spectroscopy for optical soft tissue differentiation as remote feedback control for tissue-specific laser surgery,” Lasers Surg. Med. 42(4), 319–325 (2010). 11. G. J. Greening, H. M. James, M. K. Dierks, N. Vongkittiargorn, S. M. Osterholm, N. Rajaram, and T. J. Muldoon, “Towards monitoring dysplastic progression in the oral cavity using a hybrid fiber-bundle imaging and spectroscopy probe,” Sci. Rep. 6, 26734 (2016). 12. K. Xu, Q. Qiu, J. Jiang, and X. Yang, “Non-invasive glucose sensing with near-infrared spectroscopy enhanced by optical measurement conditions reproduction technique,” Opt. Lasers Eng. 43(10), 1096–1106 (2005). 13. R. J. Barnes, M. S. Dhanoa, and S. J. Lister, “Standard normal variate transformation and de-trending of nearinfrared diffuse reflectance spectra,” Appl. Spectrosc. 43(5), 772–777 (1989). 14. Y. Zhu, T. Fearn, D. Samuel, A. Dhar, O. Hameed, S. G. Bown, and L. B. Lovat, “Error removal by orthogonal subtraction (EROS): a customised pre‐treatment for spectroscopic data,” J. Chemometr. 22(2), 130–134 (2008). 15. S. L. Jacques, “Optical properties of biological tissues: a review,” Phys. Med. Biol. 58(11), R37–R61 (2013). 16. Q. Cao, N. G. Zhegalova, S. T. Wang, W. J. Akers, and M. Y. Berezin, “Multispectral imaging in the extended near-infrared window based on endogenous chromophores,” J. Biomed. Opt. 18(10), 101318 (2013). 17. A. N. Bashkatov, E. A. Genina, V. I. Kochubey, and V. V. Tuchin, “Optical properties of human skin, subcutaneous and mucous tissues in the wavelength range from 400 to 2000 nm,” J. Phys. D Appl. Phys. 38(15), 2543–2555 (2005). 18. E. Zamora-Rojas, B. Aernouts, A. Garrido-Varo, D. Pérez-Marín, J. E. Guerrero-Ginel, and W. Saeys, “Double integrating sphere measurements for estimating optical properties of pig subcutaneous adipose tissue,” Innov. Food Sci. Emerg. 19, 218–226 (2013). 19. A. M. Goodpaster and M. A. Kennedy, “Quantification and statistical significance analysis of group separation in NMR-based metabonomics studies,” Chemom. Intell. Lab. Syst. 109(2), 162–170 (2011).


Introduction
Optical diagnostic techniques are widely employed as they provide non-ionizing, non-or minimally invasive by endoscopy, high-resolution and high-contrast diagnostic information [1].There are different techniques that are employed for optical biopsy, such as microscopy, Optical Coherence Tomography (OCT) [2] and its variants [3], spectroscopy [4], fluorescence [5], or even polarimetry [6].Diffuse reflectance spectroscopy (DRS) is a relatively easy-toimplement optical technique that provides information on tissue morphology, functionality, and/or biochemical composition.DRS has been applied mainly for the detection of cancerous tissues, such as bladder [7], colon [8], brain [9] or skin [4], among others.
Clinical applications of DRS consist of a classification problem, in which each sample is assigned to class healthy or diseased tissue.Tissue discrimination can be also implemented [10].Statistically significant classifications are supported by massive spectral measurements of each class.Those measurements come from different samples of the same animal, and from different animals.When different spectra are measured even on the same sample, for instance at different points, spectral variability appears.This variability is due to anatomical differences in the sample, instrument variation and/or slight variations in the position of the sample with respect to the spectroscopic system.This effect has been experimentally proved [11,12].An adequate quantification and statement of a non-random character of the magnitude of this spectral variability is essential for any optical diagnostic technique based on DRS.However, previous works have not included a detailed variability study of the spectra of a particular tissue at different locations and animals, usually assuming it to be random and, consequently, easily suppressible or negligible.An acceptable classification requires intraclass variability to be negligible, when compared with inter-class variability.The purpose of this work is to evaluate the intra-class variability in DRS, particularly applied to ex-vivo porcine adipose tissue.Healthy samples from different parts of the same animal, and from different animals, are measured.Usual algorithms for normalization and dimension reduction are implemented.Even a variability reduction algorithm is employed to prove its efficacy in the classification problem.Intra-class variability is estimated and quantified by two different scales.The paper is organized as follows.Section 2 gives an overview of the samples extraction methodology, the optical experimental setup, and pre-processing data algorithms.In section 3 the results are obtained and analyzed, particularly for spectral variability.Conclusions appear in the final section.

Tissue
Large White healthy pigs, 2-4 months and between 20 and 25 kg, were used in the experiments.Fat ex-vivo tissue samples from four different pigs were used after being euthanized; two samples of each pig were extracted.The procedure was done at the hospital by a veterinary surgeon.A tissue surface of 30x30mm 2 is accessible to be measured.After dissection, tissues were carefully rinsed with a solution of sodium chloride (9 mg/ml) and then wrapped with soaked gauze.The samples were stored at 6°C, where they remain up to 12 hours until processed.The protocol was approved by the Valdecilla Hospital Ethics Committee (Santander, Spain).

Optical setup
Figure 1 shows the experimental setup.A white light source (QTH 66499, Newport Corporation), illuminates the tissue after collimation by fused silica lenses (L 1 and L 2 ).Diffusely reflected light from the tissue is collected by a lens L 3 , and focused into a 1mm core diameter optical fiber.The fiber is connected to a spectrometer (BLK-CXR-SR-50, StellarNet Inc.), with an optical resolution of 0.5nm, and a 2048-pixel CCD/PDA detector with 14x200 μm/pixel.The observation angle is 30°.Each adipose tissue sample is scanned at 16 points using a micrometric translation stage, forming a 4x4 matrix, with a separation of 6 mm.Four spectra per location were captured, so a total of 512 spectra were processed.

Data pre-processing
Pre-treatment is commonly used to remove systematic noise due to undesirable variations (such as instrument artifacts and the sample itself), and to emphasize the variations of interest.Linear trends and noise are removed with a Savitzky-Golay filter.Each spectrum is normalized by subtracting its mean and scaling by its standard deviation [13].Principal Component Analysis (PCA) is employed to reduce dimensionality.PCA transforms possibly correlated variables into equal or smaller, uncorrelated (orthogonal) variables, the principal components (PC).

Group variability reduction
Spectral variability is usually further minimized by a correction process.One of the methods employed in spectroscopy is orthogonal subtraction [14].The method projects the data onto an orthogonal sub-space where the variability is contained, and the projections are subtracted.Let P be a D K × matrix whose columns are a basis of K mutually orthonormal vectors, such that D k p ∈   .The projection of the data set X onto this basis is subtracted from the original data: I is the identity matrix.P can be chosen in different ways, for instance with the first k eigenvectors of a matrix W that describes the variability between samples coming from the same i th animal i Z , a total number of animals m , and the pooled within-sample covariance matrix of the data set r : 1 , ( )

Biological tissues spectra
Absorption in the UV/visible in biological tissues is mainly caused by hemoglobin, water molecules or macromolecules, such as proteins and pigments.Two absorption bands characterize the deoxygenated hemoglobin state, with their maximums at 425 and 555nm.In the oxygenated state, hemoglobin has three absorption bands, whose maxima are at 415, 540 and 575nm [15].Absorption in the IR region can be mainly attributed to water molecules and lipids [16].Absorption peaks due to water have been measured at 970, 1197, 1430 and 1925nm [17].Lipids, such as those present in fat, have shown absorption peaks at 930, 1210-1212, 1720-1760 and 2120-2140nm [18].Figure 2(a) shows a spectrum data set from the animal number 1.A and B denote two tissue samples from different anatomical locations of the same animal, and each spectrum represents a point of the measurement matrix.Spectra from the same animal present significant variability, as it can be seen in the figure.After normalization and filtering, a remarkable variability remains in the form of standard deviation at hemoglobin absorption peaks (415, 540 and 575nm), see Fig. 2(b).The 970 nm water peak can be also seen.Data above 1100nm and below 250nm are discarded, as they present a high noise level.

Dimension reduction
Each recorded spectrum has 1700 points, from 250nm to 1100nm, every 0.5nm.After the PC projection, the first three principal components can be seen in Fig. 3

Group variability correction
The correction is applied with 2 k = , and the standard deviation mean value is reduced by 73%. Figure 3(c) shows the PC scores after the correction, where the variability still exists.For the original uncorrected data, see Fig. 3(b), the first principal component represents 82% of the data set variance, while after the correction it represents just 52% of the data set variance.

Inter-class and intra-class variability quantification
Each spectrum is projected to 10 PC scores, and the centroid of each class is calculated.The distance between each observation and the centroid of its class is measured by the Mahalanobis distance [19], which considers covariance and the scales of the variables for an observation n x  : where Σ is the covariance of the data and µ  the mean of the distribution.The statistical distribution of Mahalanobis distances with respect to the centroid of each class appears in Fig. 4(a).Each box shows a similar median for every class (horizontal orange line) and similar interquartile range.Although the boxplot shows a quite symmetric distribution around centroid for most of the samples, some measurements lie quite far from the standard deviation area.This evidence reinforces a significant intra-class variability.Mahalanobis distance between all observations and the centroid of one class is also calculated. Figure 4(b) shows the boxplot of distances of all classes calculated with respect to centroid 1 (results respect to centroids 2 to 4 are similar).It can be seen that distances to class 1 are the lowest, as expected, while the other classes are far away from centroid 1.In order to quantify if the classes are significantly separated, a one-way ANOVA test is done.The test, reported in Table 1, reveals that the classes are significantly separated in the space, as all the data is below 0.05.In order to try to quantify the relative impact of this variability on a real classification problem, Mahalanobis distances for five different tissues (fat, bone, muscle, nerve and skin) were calculated, see Fig. 5. Mean values were on the order of 8 to 20 between tissues, significant magnitudes compared with Fig. 4(b).

Conclusion
Classification with DRS is based on massive data sets from different samples and animals, which afterwards serve to train a classifier tool.Classification efficiency depends strongly on intra-class variability.Little attention is usually paid to intra-class variability, as it is assumed to be random and, consequently, easily suppressible or negligible.In this work intra-class variability has been quantified as not random, but rather dependent at least on anatomical location and the specific animal.Several optical reflectance spectra of ex-vivo porcine adipose tissue from different samples and animals have been measured.Usual normalization procedures have been applied.Variability between data coming from different samples and animals has been found, evidenced in the PCA space, and measured by using Mahalanobis distance and a one-way ANOVA test.Even after numerical variability removal algorithms, variability still exists.Intra-and inter-class variability in adipose tissue optical reflectance spectrum has been measured.The results reported suggest that intra-class variability must be evaluated for the specific problem, avoiding considering that it is random.The quantification against a tissues classification problem states its significance.As a consequence, this analysis can be relevant for the classification and discrimination problems in biological tissues employing DRS.

Fig. 2 .
Fig. 2. (a) Measurements of two samples (A and B) of animal number 1; (b) Mean spectrum and standard deviation at significant points after normalization.
(a).The 415 nm peak in PC1 and peaks at 540 and 575 nm of PC2 correspond with local minima in the spectrum (hemoglobin absorption).PC3 peak at 488 nm and PC2 peak at 716 nm indicate the concavity change of the curve.The variance for the first ten PC components represents 99.5% of the data set variance.Figure3(b)shows the first two scores of the data set.Two samples (A and B) of four animals (1, 2, 3 and 4) are represented in the scatter plot.We consider a class as the animal, so a total of four classes are defined.It can be noticed that each class is bound together forming clusters, regardless of the anatomical region.This fact indicates a variability between animals (inter-class).Furthermore, points of sample (A or B) of the same animal tend also to group, indicating variability between anatomical locations (intra-class).

Fig. 3 .
Fig. 3. (a) First three PC components.(b) PCA scores plot for the data set (first and second components).(c) PCA scores plot for the corrected data set.The first and second scores are shown.

Fig. 5 .
Fig. 5. Boxplot of distance distribution for classification analysis; median (horizontal orange line), interquartile range (rectangle) and standard deviation around the centroid (range bar).