Noninvasive transcutaneous bilirubin assessment of neonates with hyperbilirubinemia using a photon diffusion theory-based method

: Transcutaneous bilirubinometers are widely used to screen neonatal jaundice. However, it was reported that their accuracy is compromised at low and high bilirubin levels. We used a photon diffusion theory-based method valid in the 450–600 nm wavelength region to overcome this obstacle. Our clinical study results showed that our system could properly determine the transcutaneous bilirubin concentrations at total serum bilirubin levels higher than 14 mg/dL, where a commercial bilirubinometer failed to provide proper results in several cases. These findings suggested that photon diffusion theory could be employed to improve the core algorithm of modern bilirubinometers and enhance their applicability.

TcB, and the BiliChek uses multiple wavelengths between 400 nm and 760 nm to correct the differences in skin pigmentation and hemoglobin for accurate TcB measurement [7]. TcB mainly reflects bilirubin in extravascular spaces, and may be affected by skin pigmentation and thickness. On the other hand, TSB reflects the intravascular bilirubin concentration [8]. Although TcB measurements provide reasonably accurate estimations and are currently cleared by the US Food and Drug Administration for clinical use in the US [9,10], a lack of correlation between TcB and TSB during phototherapy has been observed [11,12]. The BiliChek and JM-103 bilirubinometers significantly overestimate TSB in dark-skinned neonates, and may result in unnecessary or excessive treatments [13][14][15]. The variation between the measurement data acquired by the BiliChek and TSB tends to increase in infants with relatively high TSB values, especially in non-Caucasian newborns [16,17]. A bilirubinometer is needed to overcome this inaccuracy when checking for high TSB values or undergoing phototherapy.
Another issue related to accuracy is measurement sites. Yamauchi et al. claimed that the most reliable site for TcB measurements changed with advancing postnatal age [18]. Conceição et al. claimed that TcB measured on the sternum had higher accuracy than that measured on the forehead [19]. The severity of jaundice may be assessed by Kramer's rule, with yellow skin changes usually present in a head-to-toe manner as intradermal bilirubin deposition progresses. Natalie Purcell et al. supported the hypothesis that the cephalocaudal progression of jaundice in newborns is a consequence of diminished capillary blood flow in distal parts of the body [20]. The hypothesis is that newborn infants preferentially perfuse their head and proximal parts of their body in the first few days of life, thereby leading to higher temperatures and increased bilirubin deposition at these sites. Recent advances in spectroscopic optical coherence tomography could be used to investigate the processes of local bilirubin extravasation into skin [21]. The skin perfusion gradient is thought to account for the uneven deposition of jaundice in newborns despite the fact that the serum bilirubin level is constant throughout the infant's body. Therefore, TcB measured at different sites have different deviation from TSB.
In general, diffuse reflectance spectroscopy (DRS) can work in conjunction with a photon transport model to convert the diffuse reflectance spectrum into the absorption coefficient ( a μ ) and reduced scattering coefficient ( ' s μ ) of the tissue [22]. Since in most photon transport models, such as diffusion theory, the sample structure is assumed to be homogeneous, interrogating a small sample volume would ensure better model validity. Therefore, for skin measurement applications, it is advantageous to limit the probing volume to superficial regions. However, source-detector separations should be short to investigate superficial tissues, and many photon transport models, such as the photon diffusion theory, are invalid [23]. The analytical photon diffusion model fails because the majority of detected photons do not travel long enough to satisfy the diffusion approximation. We have demonstrated that by using a diffusing probe with a high scattering layer, the photon diffusion model can accurately translate the diffuse reflectance spectra to the absorption and reduced scattering spectra of superficial tissues [24,25]. The present study aimed to investigate the capability of our system in accurate quantification of TcB even at bilirubin levels greater than 14 mg/dL or with the presence of melanin interference.

Diffuse reflectance spectroscopy system
The configuration of our spatially resolved DRS system is shown in Fig. 1(a). We used a diffusing probe to collect reflectance from the tissues. The diffusing probe was equipped with a high scattering Spectralon slab (Labsphere, NH, USA) to efficiently diffuse the light source so that we could determine the optical properties of the skin using a simplified photon diffusion equation. Four source fibers were placed on the upper surface of the Spectralon, and the detection fiber penetrated 1 mm through the Spectralon layer so that it was flush with the lower surface of the Spectralon. The detector fiber was connected to a spectrometer (QE65000, Ocean Optics, FL, USA). Four other fibers were connected to the light source through an optical switch (Piezosystem Jena, Germany). All optical fibers employed in the probe were multimode fibers with a 440 μm core diameter and 0.22 numerical aperture. The source-to-detector separation was set to 1.44, 1.92, 2.40, and 2.88 mm for measurement of the cutaneous bilirubin of neonates. Our light source was a xenon flash lamp (L11946, Hamamatsu), which provided a high-intensity continuous spectrum of 240-1000 nm. For human safety, we used a long pass filter (FEL0400, Thorlabs) to block ultraviolet light. Light passing through the filter was collimated by a lens (LA1951-ML, Thorlabs) and coupled to the input port of the multimode fiber switch. All devices were placed in a customized aluminum case, as shown in Fig. 1(b), to carry them easily to the hospital for measurement. The spectrometer and optical switch were connected to a laptop and controlled using MATLAB software (MathWorks, Natick, MA, USA).  1. (a). Configuration of the diffuse reflectance spectroscopy system and the diffusing probe used in this study. (b). Illustration of our diffuse reflectance spectroscopy system consisting of a spectrometer, optical switch, xenon flash lamp, and customized diffusing probe for measurements in the hospital.

Theoretical models
The photon propagation model for the diffusing probe has been described in detail in previous publications [24][25][26][27]. Here, we recapitulated the key steps of model derivation. In a two-layer turbid medium system, the diffusion equation can be written as follows: where D = 1/3(μ a + μ s ') and Φ are the diffusion constant and the fluence rate, respectively. S is the source term, c is the speed of light in the medium, and i = 1, 2 is the number of the layer. The light source is expressed as S 1 = δ(x,y,z-z 0 ) and S 2 = 0, where z 0 = 1/(μ a + μ s ') is the location of the point source [28]. The detector in the modified two-layer geometry is located at the boundary of the first layer and second layer. The fluence rate at the detector has the following form in the Fourier domain: . We could obtain the fluence rate at the detector by inverse Fourier transform. The spatially resolved reflectance could be calculated as the integral of the radiance L 2 at the boundary, where 2 , over the backward hemisphere, as follows: here, 2 2 x y ρ = + and R fres (θ) is the Fresnel reflection coefficient for a photon with an incident angle of θ relative to the normal to the boundary [29]. Each set of skin reflectance spectra were first calibrated by the reflectance spectra measured from a silicone phantom with known optical properties to remove the instrument response. The calibrated spectra were then reduced to an absorption spectrum and a reduced scattering spectrum of the turbid sample by using the two-layer photon diffusion model and a least-squares curve-fitting algorithm ("lsqcurvefit" nonlinear curve fitting function in MATLAB). The recovered absorption spectra were then fit to the known absorption spectra of the main chromophores to determine the phantom or tissue chromophore concentrations based on the following equation [30]: HbO HbO Hb Hb melanin melanin bilirubin bilirubin a skin where C and ε represent the concentration and extinction coefficient of a certain substance, respectively [31][32][33]. The "lsqcurvefit" nonlinear curve fitting function in MATLAB (MathWorks, MA, USA) was employed to obtain these concentrations.

Skin-mimicking phantoms with different bilirubin and coffee concentrations
We fabricated a series of phantoms by using 20% gelatin (Sigma-Aldrich, G2500-500G) as a base to imitate human skin, TiO 2 particles as the scatter, and bilirubin (Sigma-Aldrich, B4126-1G) of different concentrations as the absorber. In the experiments, we mixed water with coffee powder (Rich Blend, NESCAFE, Taiwan) to mimic skin melanin [34]. Bilirubin powder was mixed with the bovine serum albumin (UR, BSA001-100G) to assist its homogeneous distribution in the gelatin phantom. All the basic materials were prepared at one time and then added to different concentrations of coffee and bilirubin powder separately to minimize artificial error. In the beginning, the gelatin powder was slowly added to deionized water and heated in a microwave oven to 60 °C. The TiO 2 , bilirubin, and coffee powder were then added slowly with gentle stirring. Finally, all mixtures were homogeneously mixed using an ultrasonic cleaner. The mixture was separated into different molds and allowed to cool for 1 h.

Measurements of newborns
We enrolled neonates delivered in the Kaohsiung Veterans General Hospital during March to November 2018. The inclusion criteria included a gestational age of over 36 wk and birth weight of more than 2000 g. The protocol was approved by the Institutional Review Board (No. VGSKS18-CT1-22). Written informed consent was obtained from the neonates' parents prior to the measurements. We determined the TSB using a capillary sample gas analyzer (APEL Neonates BR-5200P) and TcB using a Philips BiliChek and our DRS system. The TcB measurements were performed three times at each skin site (e.g., the forehead and sternum) by the same person, and the mean of the three measurements was determined. All measurements were completed within 30 min after capillary blood sampling for serum bilirubin concentration measurements.

Phantoms with different bilirubin concentrations
We fabricated a series of homogeneous phantoms to verify the linear relationship between the measurements and benchmark values of bilirubin in our DRS system. The main purpose of this phantom study was to verify the applicability of the modified diffusion model in decoupling the light absorption contribution from pigmentation and bilirubin at 400-600 nm wavelength region. It should be noted that our theoretical model assumes homogenous sample structure and cannot accurately recover the optical properties of layered samples such as human skin. Applying such models in recovering optical properties of layered samples and real skin would induce systematical error in derived absorption and reduced scattering coefficients.
The coffee concentration of the phantoms was fixed at 250 mg/dL, and different bilirubin concentrations (0, 1, 3, 5, 10, 15, 20, and 40 mg/dL) were divided into eight groups (Fig. 2). It should be noted that 40 mg/dL represented an extremely high bilirubin concentration for neonatal jaundice that is rarely seen in clinical cases. Each phantom was measured at five points, and each point was measured five times. A total of 25 diffuse reflectance values of each phantom were recovered by our DRS system. The recovered absorption spectra were fit linearly with known absorption spectra of chromophores, including bilirubin, coffee, gelatin, and water, to extract the phantom chromophore concentrations based on the Beer-Lambert law. The fitting result is shown in Fig. 3. The recovered coffee and bilirubin values of the phantoms and their deviations from the benchmark values are listed in Table 1. We can see in Table 1 that the maximum deviation of coffee was 15.8 mg/dL, and the recovered bilirubin concentrations deviated from the benchmark values within 0.93 mg/dL. In addition, the mean value and standard deviation of the recovered coffee concentrations was 242.81 ± 7.03 mg/dL, and the mean absolute error of the recovered coffee concentrations from the benchmark value was 3.45%. The average deviation of the recovered bilirubin concentrations from the benchmark value was 8.7 mg/dL. The average deviation of the recovered bilirubin concentrations from the benchmark value was 0.41 mg/dL. The deviation of recovered coffee concentration from the benchmark value was generally higher than that of recovered bilirubin concentration. Since the only feature of the absorption spectrum of coffee in the 400-600 nm region is the decreasing of magnitude with wavelength, which is similar to the feature of light scattering introduced by TiO 2 , we found that the recovered concentration of coffee and scattering coefficients were sometimes coupled. On the other hand, bilirubin has an absorption peak around 470 nm, thus the accuracy of bilirubin concentration recovery is less affected by the variation of coffee concentration and light scattering. The results verified that the bilirubin concentration recovered by our DRS system had excellent agreement with the benchmark bilirubin concentration in the range from 0 mg/dL to 40 mg/dL. Moreover, the concentration of coffee could be properly determined, and there was no observed influence of coffee on the recovered bilirubin concentration values.

Phantoms with different bilirubin and coffee concentrations
Numerous in vivo studies have indicated that changes in the melanin concentration affect the TcB measurement results [11,35]. Thus, we designed a phantom study to understand the influence of coffee on the bilirubin concentration values determined using our DRS system. Figure 4 shows photos and compositions of the nine phantoms. The coffee concentration of the phantoms was either 250, 500, or 750 mg/dL, and the bilirubin concentration of the phantoms was either 5, 10, or 15 mg/dL. The fitting results are shown in Fig. 5. It can be observed in Fig. 5(a) that the coffee concentration of the nine phantoms could be determined with little deviation from the benchmark value. On the other hand, in Fig. 5(b), the bilirubin concentration recovery results did not show a clear dependency on the coffee concentration. The correlation coefficient between the recovered bilirubin concentration and coffee concentration approached zero (r = 0.000006).
The recovered coffee and bilirubin values of the phantoms and their deviations from the benchmark values are listed in Table 2. The mean values and standard deviations of the recovered coffee concentrations were 257.14 ± 8.58, 517.29 ± 11.57, and 748.84 ± 18.89 mg/dL for the three different coffee concentration sets. The maximum deviation of the recovered coffee concentration was 30.6 mg/dL. Overall, it could be observed that the recovered coffee concentrations were not affected by the variation in bilirubin concentration.
The mean values and standard deviations of the recovered bilirubin concentrations were 5.99 ± 0.50, 10.08 ± 0.82, and 16.40 ± 0.19 mg/dL for the three different bilirubin concentration sets. The mean absolute error of the recovered bilirubin concentration of the benchmark was 11.75%. The maximum deviations of the recovered bilirubin concentrations from the benchmark were 1.50, 1.03, and 1.56 mg/dL for the three sets. The recovered bilirubin concentrations of CP1-CP3 and CP7-CP9 were higher than the benchmark values. This overestimation in bilirubin concentration did not seem to depend on the variation in coffee concentration. We suspect that this phenomenon may have been due to some artifacts during phantom fabrication. Two major artifacts during phantom fabrication were: 1. The amount of bilirubin powder to be added in the phantoms was small (~0.1 g); 2. It was difficult to make the hydrophobic bilirubin powder distributed evenly in water-based gelatin phantoms.

In vivo neonatal bilirubin measurements
There were 27 neonates enrolled in this study, and their demographic characteristics are summarized in Table 3. The gestational age, birth weight, and postnatal age did not show a significant difference among the two groups. We separated the neonates into two groups according to the TSB value of 14 mg/dL. In this study, we followed the guideline of the Kaohsiung Veterans General Hospital and used 14 mg/dL as the distinction of a high bilirubin value, at which further investigation and treatment was implemented. The TSB levels of all the neonates ranged from 3.2 mg/dL to 19.9 mg/dL, and TSB levels greater than 14 mg/dL were found in seven neonates. The seven neonates with a high serum bilirubin level had not received phototherapy treatment before the measurements were performed. The gestational maturity relates to the neonatal skin thickness [36]. It has been suggested that the correlation between the TcB and the TSB may be influenced by the skin maturity (skin thickness) [37]. Figure 6 shows the dependencies of μ a and μ s ' at λ = 450 and 600 nm on gestational maturity (gestational + postnatal age) at the sternum of 27 neonates. Little correlation can be observed between μ s ' at 600 nm and gestational maturity, and this may suggest that the skin structure of the neonates under investigation was similar. Interestingly, we observed that the absorption and reduced scattering coefficients of the neonates recruited in this study were lower and higher than those shown in Bosschaart's work, respectively [38]. Further investigation is needed to understand whether this phenomenon is caused by regional difference or systematical shift introduced by the theoretical modelling. The TcB value in the investigated skin volume by our DRS system at the forehead is approximately a factor 2 lower than the corresponding TSB value (linear regression on all 27 measurements yields TcB = 0.44*TSB + 1.34). We corrected the measured TcB by the linear regression formula to compare to the BiliChek results. The corrected transcutaneous measurement results at the forehead versus TSB are shown in Fig. 7(a) and Fig. 7(b). The Pearson correlation coefficients (r) were 0.88 and 0.89 for TSB and TcB recovered by our DRS system and the BiliChek in all neonates, respectively. In general, both systems showed a good correlation with TSB in all neonates at the forehead. Figure 8 shows the Bland-Altman plot used for assessing the agreement between TSB and the two measurement systems. The standard deviation was 2.52 mg/dL for our DRS system and 2.14 mg/dL for the BiliChek. Most of the data obtained from the two systems were within ± 1.96 standard deviations. However, the BiliChek had the tendency of overestimating TSB. Similar trends have been found in other studies [39].  As mentioned in the introduction, many research groups have reported that the accuracy of bilirubinometers currently available in the market degrades at high TSB levels [16,17]. In order to evaluate the performance of our DRS system at a high bilirubin level, we analyzed the data of TSB concentrations greater than 14 mg/dL separately. Figure 9 shows the data extracted from Fig. 7 for TSB concentrations greater than 14 mg/dL. The Pearson correlation coefficients (r) were 0.76 and 0.56 for our DRS system and the BiliChek, respectively. Compared with the data shown in Fig. 7, the correlation coefficients decreased in both systems, and we suspect that this was caused by the reduced amount of data. Although the correlation with the TSB levels was higher for our DRS system than for the BiliChek, there was no clear difference in performance between the two systems according to Fig. 9. The TcB value in the investigated skin volume by our DRS system at the sternum is corrected by the linear regression formula (TcB = 0.49*TSB + 0.91). The measurement results at the sternum and the corresponding TSB values are plotted in Fig. 10. The Pearson correlation coefficients (r) between the TSB values and TcB values determined with our DRS system and BiliChek were 0.92 and 0.87, respectively. Our DRS system had a higher correlation with TSB at the sternum than at the forehead and the raw TcB values at the sternum were higher than the raw TcB values at forehead. This finding agreed with several reports that indicated that TcB measured at the sternum would have a higher accuracy than that at the forehead [19]. It should be noted that BiliChek showed OOR (out of range) in two neonates whose TSB levels were greater than 19 mg/dL, and these data were not plotted in Fig. 10(b) and were not included in the calculation of the Pearson correlation coefficient. Figure 11 displays the Bland-Altman plots derived from the data shown in Fig. 10. The standard deviation was 1.99 mg/dL for our DRS system and 1.97 mg/dL for BiliChek. The average differences between the TSB and TcB values recovered by our DRS system and BiliChek were 1.5 mg/dL and 3.7 mg/dL, respectively. It was observed that the BiliChek measurements systematically overestimated the bilirubin concentration. BiliChek at the sternum. The data encircled in (a) indicate the two neonates whose TSB levels were greater than 19 mg/dL and could not be determined using BiliChek. We also analyzed the data of TSB concentrations greater than 14 mg/dL separately at the sternum, as illustrated in Fig. 12. The Pearson correlation coefficients (r) were 0.85 and 0.38 for our DRS system and BiliChek, respectively. Only five data points were analyzed for BiliChek because there were two cases with TSB levels greater than 19 mg/dL, which were unmeasurable using BiliChek. With the decreased number of data points for BiliChek, the correlation coefficient determined here may not have reflected its true performance. In contrast, under the same conditions, our DRS system could perform well even when the TSB values were greater than 19 mg/dL. Our results suggested that our DRS system was still reliable when measuring neonates with high bilirubin levels at the sternum. Previous research has shown that TcB is affected by various factors, such as melanin and hemoglobin concentrations [13]. The algorithms of many bilirubinometers do not properly consider the influence of the skin melanin and/or use an estimated hemoglobin concentration for all cases [7,35]. Other transcutaneous bilirubinometers, such as BiliChek, use multiple wavelengths to correct for the contribution of skin pigmentation and hemoglobin based on an empirical database or algorithm [7,40]. Since the concentrations of melanin and hemoglobin vary with body sites and the physiological conditions of subjects, empirical methods may not be applicable for all cases. In this study, we employed a custom DRS system that worked in conjunction with a physics model to accurately and efficiently recover the TcB of skin in vivo. It was demonstrated that our DRS system had reliable performance at different skin sites and had great applicability at a wide range of bilirubin values, especially at bilirubin levels greater than 19 mg/dL. However, this study could be improved in several aspects. First, the gestational ages of all neonates enrolled in this study were over 36 wk, and they were all Asian, so the effects of different gestational age and ethnicity may be unclear. Second, only seven infants with TSB concentrations over 14 mg/dL were analyzed. Additional data of higher TSB values are needed to reasonably assess the system accuracy with hyperbilirubinemia measurements. Further study is needed to evaluate different age groups, different birth weights, and superficial tissue maturity. Third, the measurement precision of our system needs to be improved. In the study, the measurement precision of our system is currently not better than BiliChek. We found that neonatal skin is generally softer than adult's skin, and the capillary refilling time in neonates is longer than that in adults. Thus, slight variation in optical fiber probe contact pressure would introduce noticeable change in recovered optical properties in this neonatal study, but not in our previous adult study. It has been reported that variation of probe contact pressure greatly affects measurement precision [41,42]. We have to investigate probe pressure monitoring solutions that can be integrated into our probe to improve the measurement precision of our system. Besides, in order to reduce the personal error in measurement, our DRS system and BiliChek were operated by the same person. For each subject, BiliChek was always applied first to measure forehead and then sternum. DRS measurements were subsequently carried out at the same skin area. Whether this measurement sequence would affect the local physiology of skin, e.g. microcirculation, is not clear at this moment.
There are several commercially available bilirubinometers for TcB measurements. Some of them employ LEDs and photo detectors in the system to collect light reflectance at several wavelengths, such as Dräger JM-105, and some of them use a broadband light as well as a spectrometer to acquire a broadband reflectance spectrum, such as Philips BiliChek. The high price of spectrometer-based systems limits their adaptation by many hospitals at resource limited regions. Our system also requires a spectrometer to measure skin reflectance spectrum to determine bilirubin concentration thus the cost of system would be higher than the photo detector-based ones. We will investigate the use of miniaturized, low cost spectrometers in our system to decrease the cost and size of the system in the future study. Overall, our preliminary data showed that our DRS system provided reliable TcB measurements at a wide range of bilirubin values, and the Pearson correlation between the derived TcB and TSB at different skin sites (forehead and sternum) was at least 0.88. Such a system is crucial for the diagnosis of neonatal jaundice, especially for premature neonates who usually have limited body sites for measurements or who have severe hyperbilirubinemia and need to be checked frequently.

Conclusion
In this study, we developed a custom DRS system aiming to provide reliable neonatal bilirubin concentration determination. The performance of our DRS system was carefully validated through phantom studies. We conducted in vivo studies using our DRS system and Philips BiliChek. The measurements were performed on the forehead and sternum of 27 neonates, and it was found that our DRS system tracked the TSB variation properly and did not overestimate the bilirubin levels. Most importantly, our DRS system did not fail in measuring neonates with TSB concentrations greater than 19 mg/dL, while BiliChek did. We will extend this study to include more subjects to investigate the capability of our DRS system in measuring bilirubin concentrations in neonates with various birth weights, postnatal ages, and ethnicities.