Tissue intrinsic fluorescence recovering by an empirical approach based on the PSO algorithm and its application in type 2 diabetes screening

: In order to reduce the influence of scattering and absorption on tissue fluorescence spectra, after tissue fluorescence and diffuse reflectance in different tissue optical properties were simulated by the Monte Carlo method, a tissue intrinsic fluorescence recovering algorithm making use of diffuse reflectance spectrum was developed. The empirical parameters in the tissue intrinsic fluorescence recovering algorithm were coded as a particle in the solution domain, the classification performance was defined as the fitness, and then a particle swarm optimization (PSO) algorithm was established for empirical parameters optimization. The skin autofluorescence and diffuse reflectance spectra of 327 subjects were collected in Anhui Provincial Hospital. The skin intrinsic autofluorescence spectra were recovered by using the empirical approach and the integration area of the spectra were calculated as fluorescence intensity. Receiver operating characteristic (ROC) analysis for fluorescence intensity was applied to evaluate the classification performance in type 2 diabetes screening. In addition, a support vector machine (SVM) method was implemented to improve the performance of the classification. The results showed that the sensitivity and specificity were 32% and 76% respectively, and the area under the curve was 0.54 before recovering, while the sensitivity and specificity were 72% and 86% respectively, and the area under the curve was 0.86 after recovering. Furthermore, the sensitivity and specificity increased to 83% and 86% respectively when using linear SVM while 84% and 88%, respectively, when using nonlinear SVM. The results indicate that using the tissue fluorescence spectrum recovery algorithm


Introduction
Fluorescence, the re-emission of light by fluorophore that has absorbed a shorter wavelength light, is widely used to investigate biological tissues [1, 2]. The shape and intensity of the fluorescence spectrum contain valuable information on the identity and density of fluorophore in tissue. However, in samples with high scattering or absorption, such as tissue, both the shape and intensity of measured fluorescence spectrum can be heavily distorted, and then resulting in the measured fluorescence spectrum may not be proportional to fluorophore concentration [3]. Untangling the effects of this attenuation on the measured fluorescence spectrum is necessary for truly quantitative analysis.
Several methods have been reported to disentangle the effects of absorption and scattering from the measured fluorescence spectrum to recover the intrinsic fluorescence spectrum. Bradley et al. reviewed over 50 different publications that addressed the recovery of intrinsic fluorescence spectrum [4]. These studies have used theoretical methods based on physical models of light tissue interactions, including analytical approaches based on diffusion theory [5, 6] as well as computational techniques such as Monte Carlo simulations of photon transport in turbid media [7] or simple empirical approaches [8,9]. Although promising results have been obtained with various methods, thus far, all available methods have their limitation.
Diffusion theory is an approximation to the radiative transfer equation and it describes photons transporting in absorbing and scattering media. Thus it can correct the measured fluorescence spectrum based on the tissue absorption and scattering properties to extract the intrinsic fluorescence spectrum analytically. However, the diffusion approximation is only valid when the absorption coefficient of the medium is at least an order of magnitude lower than scattering, and the distances between sources and detectors are much greater than the mean free path of diffusing photons in that medium.
The Monte Carlo method is stochastic in nature and is not limited to the diffusion regime. It can therefore be used to model light transport over the entire UV-visible-NIR range. However, Monte Carlo simulations are computationally time-consuming and have historically not been very convenient to use as inverse models. Empirical approach, including subtraction techniques and ratio techniques, always employ the measurements of the reflected illumination intensity in order to reduce the effects of tissue optical properties on fluorescence. In subtraction techniques, attenuation in fluorescence due to tissue optical properties is compensated for by subtracting a proportion of the corresponding change in the reflected excitation light intensity from the fluorescence intensity. Subsequently, investigators argue that it would be more appropriate to utilize a ratio of fluorescence intensity to diffuse reflectance intensity rather than an algebraic subtraction. However, there has been no clear consensus on how to realize the ratio techniques and how to determine the form and the parameters of ratio equation. In addition, as the same to subtraction technique, ratio techniques also lack the model foundation.
Advanced glycation end products (AGE) are a complex and heterogeneous group of compounds that have been implicated in diabetes related complications [10]. Skin autofluorescence is related to the accumulation of AGE and has a potential to provide prognostic information of diabetes and its related complications [11,12]. While the skin autofluorescence is heavily distorted by the tissue scattering and absorption, resulting in the measured autofluorescence not be proportional to AGE accumulation. In this paper, we simulated the tissue fluorescence and diffuse reflectance at both the excitation and the emission wavelength in different tissue optical properties by Monte Carlo method and introduced a tissue intrinsic fluorescence recovering algorithm [13], which made use of a diffuse reflectance measurement taken at the same location. In order to ensure the recovering performance, a particle swarm optimization (PSO) algorithm was introduced to accomplish the empirical parameters optimization. Subsequently, we introduced this recovering algorithm to untangle the effects of tissue attenuation on the measured fluorescence spectrum and used support vector machine (SVM) to classify the recovered intrinsic fluorescence for type 2 diabetes screening.

Subjects
A total of 346 subjects (152 male, 194 female) were recruited in Anhui Provincial Hospital in 2014. Meanwhile, 19 subjects were excluded according to the exclusion criteria:1) subjects unwilling to comply with test specifications; 2) the site of the skin measured has scar, mosslike sclerosis spots, vitiligo, deformity and infectious skin disease. At last, 327 subjects (143 male, 184 female) were included. The mean age was (47 ± 17) years, mean height was (164 ± 8) cm, mean weight was (64.7 ± 11.4) kg and mean BMI was (23.6 ± 3.7) kg/m 2 . Oral glucose tolerance test (OGTT) was carried out for all subjects. And then, all subjects were divided into diabetes mellitus (DM) group and non-diabetic mellitus (NDM) group based on OGTT-2h-value (If OGTT-2h-value was greater than 11.1/L, determine the subject with diabetes, otherwise, determine non-diabetic). Ultimately, there were 208 subjects in DM group, and 119 subjects in NDM group. Additionally, skin autofluorescence was measured for both DM and NDM groups, everyone was measured three times and the average was taken as the final value. The trial was approved by the Medical Ethics Committee of Anhui Provincial Hospital, and informed consent form was obtained from the subjects.

Instrumentation
The optical system, as depicted in Fig. 1(a), has been described particularly in our previous work [14]. In short, it consisted of an ultraviolet light source as excitation source, a broadband light source, a trifurcated fiber-optic probe, and a compact charge-coupled device (CCD) spectrometer. In addition, there were other parts such as power supply units, control units, optical and mechanical parts. The above trifurcated fiber-optic probe consisted of two channels, as depicted in Fig. 1(b), identified as a and b. The channel a, which connected to the ultraviolet light source, was designed for fluorescence measurements. The channel b, which connected to the broadband light source, was used for reflectance measurements. The illumination fibers of channels a and b were arranged in a concentric circle at the probe distal end as Fig. 1(c). Moreover, the numerical aperture of all fibers was 0.22, and the core diameters of illumination and collection fibers were 200 and 600 μm, respectively.
Convolution was used to integrate over the illumination and collection fibers in order to determine the probability that a photon traveling a fixed distance and would be collected [15,16]. Considering current probes's geometry, the probability of collection of a photon traveling a net distance t r between the points of entering and exiting the medium is given by: lb r s r r = − − − , the factor of 6 accounts for the six illumination fibers surrounding the collection fiber, i r means the radius of the illumination fiber, c r means the radius of the collection fiber, s means the distance between the center of illumination and collection fibers, which is 500 μm, t r is equal to s .
Finally, the excitation wavelength was 370 nm, and the corresponding emission wavelength was 480 nm. The spectrometer was USB4000-UV-VIS fiber optic spectrometer of Ocean Optics and the range of integration times was 3.8 ms ~10 s. Figure 2 showed the flow chart of tissue fluorescence and diffusion reflectance collection. During the fluorescence measurement, excitation source was activated; while in diffusion reflectance spectra measurement, the broadband light source was activated. Ultimately, the fluorescence and diffuse reflectance spectra were transmitted to the computer.

Recovery algorithm
Tissue diffuse reflectance spectrum was measured at the same site with the fluorescence, and it could be employed to recover the distortion of fluorescence due to tissue absorption and scattering. To extract the intrinsic fluorescence from the extrinsic measured fluorescence, an empirical recovery algorithm based on diffuse reflectance spectrum was established. The central mission was to determine the equation form and the corresponding parameters.
The recovery algorithm described in Eq.
(2) was based on the assumption that the optical absorption at the excitation wavelength x λ was high relative to that at emission wavelength m λ . This was generally true in tissue if the excitation wavelength was in the UV to the blue end of the visible spectrum. As a result, most fluorophore absorption events occurred close to the source fiber. The migration paths of the fluorescence emission photons at m λ were then approximated by those of the reflectance photons at m λ , emitted and collected using the same fiber optic geometry. The tissue optical model for fluorescence and diffuse reflectance collection were shown in Fig. 3.
3. Tissue optical model (a) for fluorescence measurement; (b) for diffuse reflectance spectra measurement.

Particle swarm optimization (PSO) algorithm
The particle swarm is a population based stochastic algorithm for optimization which is based on social psychological principles [17,18]. In PSO algorithm, the particles are placed in the search space, and each evaluates the objective function at its current location. Then each particle determines its movement through the search space by combining some aspect of the history of its own current and best locations with those of one or more members of the swarm, with some random perturbations. The next iteration takes place after all particles are moved.
Eventually the swarm as a whole, like a flock of birds collectively foraging for food, is likely to move close to an optimum of the fitness function. Each individual in the particle swarm is composed of three D-dimensional vectors, where D is the dimensionality of the search space. These are the current position i x , the velocity i v , and the previous best position i p for each individual. In addition, for the particle swarm, there is the global best position g p . The process for implementing PSO is shown in Algorithm 1. Figure 4 showed the flow chart of PSO algorithm. 5. Identify the particle in the neighborhood with the best success so far, and assign its index to the variable g .
6. Change the velocity and position of the particle according to the following equation where ω is inertia weight; id v is particle velocity; id p is optimal individual particle; gd p is optimal global particle; 1 η , 2 η are learning factors; id x is particle location and rand is random number between 0 and 1.
7. If a criterion is met (usually a sufficiently good fitness or a maximum number of iterations), exit loop.

Support vector machine
Support vector machine (SVM) is a machine-learning method based on the principle of structural risk minimization and originally developed by Vapnik and Burges [19]. It has many unique advantages in solving small sample, nonlinear and high dimensional pattern recognition. The main mechanism of SVM is to hunt an optimal separating hyper plane that meets the classification requirements. The plane should ensure the required classification accuracy, as well as make the classification interval maximum. In theory, SVM can achieve the optimal classification for linearly separable problems. For nonlinear separable problems, they are firstly mapped into a high dimensional linearly separable space through a nonlinear mapping, and then traded as linearly separable problems. The kernel functions used in constructing nonlinear SVM classifiers are the polynomial function, the radial basic function (RBF), and the neural network function, etc. One of the most used kernel functions in the reported work is the RBF kernel defined as: During the classification using SVM, two aspects should be considered: how to choose the optimal input feature subset for SVM, and how to set the best kernel parameters. These two aspects are crucial, because the feature subset choice influences the appropriate kernel parameters and vice versa. Therefore, obtaining the optimal feature subset and SVM parameters must occur simultaneously. In this paper, skin autofluorescence was chosen as input feature. The parameters that should be optimized include penalty parameter C and the kernel function parameters such as parameter γ for the RBF kernel. To design a SVM, one must choose a kernel function, set the kernel parameters and determine the penalty parameter. Penalty parameter represents the compromise on training error and generalization ability. Cross-validation and grid-optimization methods are alternative to determine the kernel function and the optimal parameters. In this work, linear SVM and nonlinear SVM based on RBF kernel were used to classify the skin autofluorescence of control subjects and diabetes.

Tissue intrinsic fluorescence recovery algorithm
Monte Carlo method was used to simulate the tissue fluorescence and diffuse reflectance in different tissue optical properties. During the simulation, we made some assumptions. Firstly, the illumination fiber and the collection fiber were set to the same type of 200/220μm fiber with a numerical aperture (NA) of 0.22, and the separation between the centers of the illumination and the collection fibers was 0.3 mm. Secondly, the size of tissue was set to 4 mm × 4 mm, and the thickness was 1mm. For the excitation wavelength, we set tissue absorption coefficient ranging from 1 to 15 cm −1 and reduced scattering coefficient to 5.1cm −1 ; for the emission wavelength, we set tissue absorption coefficient is 1.1 cm −1 , and reduced scattering coefficient ranging from 10 to 150 cm −1 . The refractive index of tissue was 1.37, the anisotropic properties of tissue was 0.9 [20]. Because the scattering coefficient at excitation wavelength ,    It could be seen that most of the changes in fluorescence magnitude due to tissue absorption at the excitation wavelength and tissue scattering at the emission wavelength could be corrected by further dividing the raw fluorescence and by an empirical power function of diffuse reflectance at excitation wavelength and diffuse reflectance at emission wavelength respectively. Furthermore, we can extend singleness emission wavelength to emission spectrum specific tissue. So an empirical approach, described as Eq. (6), was introduced to recover the intrinsic fluorescence [13]. were calculated use the raw measured spectrum devided the collection probability. The k x and k m were semi-empirical parameters ranging from 0 ~1.

Recovery parameters optimization
Diabetes mellitus (DM) and impaired glucose tolerance (IGT) detection are conventionally based on glycemic criteria. Skin autofluorescence (SAF) is a noninvasive proxy of tissue accumulation of advanced glycation end products (AGE) which is considered to be a carrier of glycometabolic memory [21]. While the detected skin autofluorescence spectrum is heavily distorted due to emitted fluorescence being absorbed and scattered within the sample, and furthermore, the propagation of the excitation light is affected by the background absorption and scattering, they result in that the detected autofluorescence may not be proportional to fluorophore concentration. In order to disentangle the effects of absorption and scattering from the measured autofluorescence spectrum, the empirical algorithm described in section 3.1 was introduced, and then we used the PSO algorithm to optimize the recovery parameters. Table 1 showed the PSO parameters for recovery parameters optimization.  During the parameters optimization, recovered skin autofluorescence integrated intensity from 400 nm to 600 nm of both DM and NDM group subjects was analyzed by receiver operating characteristic (ROC), and the fitness of particle was defined as the area under the ROC curve (AUC); population size was set to 20; and the maximum number of iterations was set to 50. Figure 6 showed the optimization results of PSO algorithm. Figure 6(a) showed that the recovery parameters x k and m k tended to be stable after 17 times iteration, and their values were 0.61 and 0.43 respectively. In the corresponding, the best fitness of particle was 0.86 in Fig. 6(b). 0.86, significantly higher than the value of 0.54 before recovering. The sensitivity and specificity were 72% and 86% respectively, significantly better than the values of 32% and 76% before recovering.  Figure 8 showed the raw measured fluorescence spectrum and the recovered intrinsic fluorescence spectrum of both DM and NDM group subjects. The red solid and dotted line were the raw measured fluorescence spectrum and the recovered intrinsic fluorescence spectrum of DM group, respectively; Black solid and dotted line were the raw measured fluorescence spectrum and the recovered intrinsic fluorescence spectrum of NDM group, respectively. The raw measured fluorescence spectrum intensity of DM group was significantly higher than the NDM group. Both DM and NDM group, the measured fluorescence spectrum existed peaks at 450 nm, 480 nm and 510 nm. For the measured fluorescence of DM group, the peak at 510 nm was significantly higher than the peak at 480 nm, while for the measured fluorescence of NDM group, the peak at 510 nm was equal to the peak at 480 nm. The measured fluorescence of both DM and NDM group exist an obvious difference in curve shape. While during intrinsic fluorescence recovering, the fluorescence intensity of DM group decreased and the fluorescence shape of DM group was distorted; the fluorescence intensity of NDM group increased and the fluorescence shape of NDM group was distorted. After recovered, the fluorescence spectrum intensity of DM group was still significantly higher than NDM group, but the shape was very similar. For the recovered intrinsic fluorescence spectra of both DM and NDM group, the fluorescence emission peak at 450nm changed to 440 nm. The main emission peak changed to 480 nm, which was more consistent with matrix organization fluorescence spectrum stimulated emission of radiation.

Fluorescence classification for type 2 diabetes screening
The dimension of spectrum was generally high. For instance, the tissue intrinsic autofluorescence spectrum used in this study had 200 dimensions (from 400 nm to 600 nm). The high dimension of data space may cause complexity in implementation of the SVM algorithm because computation of all the inner products between the sample and support vectors in a high-dimensional feature space was complicated and time-consuming. In order to simplify the implementation of the SVM algorithm and improve its performance, principal component analysis (PCA) method was introduced to reduce the dimensions of the tissue intrinsic autofluorescence. When PCA was used to process autofluorescence spectra, it transformed wavelengths, the original spectral variables, into a set of PC spectra. Each original spectrum was a combination of the PC loading spectra that were orthogonal to each other. The PCs with negligible contributions to the variance of the data set were eliminated. The dimensions of the data set for developing the diagnostic algorithm can be significantly reduced without losing useful information. Figure 9 showed the contribution of PC to the variance of the 327 autofluorescence spectra. The PCs were calculated with a MATLAB based PCA program. As shown in the Fig., the first PC accounted for 92.8% of the total variance, the first two PCs accounted for 96.1% and the first five PCs accounted for 99.5%. In the process of SVM learning and testing, all of the tissue intrinsic fluorescence spectra were analyzed by PCA method and the first five PCs were chosen as the input vectors. The 327 samples were randomly divided into training set (75%) and testing set (25%). The training set was used to cross validation and then optimize the kernel parameters; the testing set was used to evaluate the performance of the SVM algorithms. Table 3 showed the results of the classification of the first five PCs. The results demonstrated that both the linear SVM algorithm and nonlinear SVM has a good performance in spectra classification for diabetes screening.

Conclusion
Tissue fluorescence spectroscopy is highly sensitive to the microenvironment inside the tissue, and has broad application prospects in cancer tissue detection, photodynamic therapy and other fields.
In this paper, tissue fluorescence and diffuse reflectance in different tissue optical properties were simulated by Monte Carlo method and tissue intrinsic fluorescence recovering algorithm, making use of a diffuse reflectance measurement taken at the same location, was established. The empirical parameters in tissue intrinsic fluorescence recovering algorithm were coded as a particle in solution domain, the classification performance was defined as the fitness, and then a particle swarm optimization (PSO) algorithm was established for empirical parameters optimization.
The tissue intrinsic fluorescence recovering algorithm was used in skin autofluorescence recovering, and then the recovered intrinsic fluorescence was analyzed for diabetes screening combing with SVM. The comparison results of FPG, HbA1c and noninvasive method were shown in Table 4. The first set of data showed the performance of FPG and HbA1c for type 2 diabetes screening introduced by American Diabetes Association. The second to fourth sets of data showed the comparison of FPG, HbA1c and noninvasive method introduced by different groups. While the last set data showed the performance of noninvasive method for type 2 diabetes screening in this paper. From Table 4, noninvasive method based on skin fluorescence showed a better performance than FPG and HbA1c in diabetes screening. And comparing to the previous work, tissue intrinsic fluorescence recovering by empirical approach based on PSO algorithm in this paper can improve the effect of diabetes screening based on fluorescence spectrometry, has an obvious progress.

Disclosures
The authors declare that there are no conflicts of interest related to this article.