Comparison of tissue oximeters on a liquid phantom with adjustable optical properties: an extension

: Cerebral near-infrared spectroscopy (NIRS) oximetry may help clinicians to improve patient treatment. However, the application of NIRS oximeters is increasingly causing confusion to the users due to the inconsistency of tissue oxygen haemoglobin saturation (StO 2 ) readings provided by diﬀerent oximeters. To establish a comparability of oximeters, in our study we performed simultaneous measurements on the liquid phantom mimicking properties of neonatal heads and compared the tested device to a reference NIRS oximeter (OxiplexTS). We evaluated the NIRS oximeters FORE-SIGHT, NIRO and SenSmart, and reproduced previous results with the INVOS and OxyPrem v1.3 oximeters. In general, linear relationships of the StO 2 values with respect to the reference were obtained. Device speciﬁc hypoxic and hyperoxic thresholds (as used in the SafeBoosC study, www.safeboosc.eu ) and a table allowing for conversion of StO 2 values are provided.


Introduction
Near-infrared spectroscopy (NIRS) is a technique for measuring oxygenation of tissue noninvasively and continuously. One important application may be to prevent cerebral haemorrhagic and ischaemic insults in preterm infants. A randomized controlled trial safeguarding the brains of our smallest children (SafeBoosC) showed that it is possible to reduce the cerebral hypoxic/hyperoxic burden in extremely preterm infants, when combining NIRS monitoring with a treatment guideline [1]. This also led to a substantially reduced mortality and incidence of severe brain lesions in the treatment group. However, this reduction was not statistically significant and therefore a large study is planned to confirm the clinical benefits. One major issue here is that different brands of NIRS oximeters provide systematically different oxygenation values, as shown by numerous studies [2][3][4][5][6][7][8]. Currently, tissue oxygen haemoglobin saturation (StO 2 ) values cannot be compared between different oximeters and sensors [9,10]. This constitutes a problem when setting alarm limits in the mentioned new trial or when interpreting the literature [11].
Why do different NIRS oximeters provide different values? NIRS oximeters measure absolute values of the StO 2 , which represents the proportion of oxygenated haemoglobin of all haemoglobin present in arterial, capillary and venous compartments of the tissue interrogated by the sensor. The reason for the different values between different brands is that it is difficult to validate. One StO 2 in vivo approach is to take arterial and venous (jugular vein for cerebral measurements) blood samples and to determine their oxygen haemoglobin saturation (SO 2 ) by co-oximetry, a trusted method.
An ethical problem here is that venous blood samples from the jugular bulb cannot be collected without risk to the patient [12]. In particular in preterm infants, sampling the jugular bulb is clinically not feasible. For ethical reasons, these studies are therefore typically either performed in healthy young adults or patients requiring a jugular bulb catheter for clinical reasons. For the latter an influence of their pathology is likely. Both scenarios may not be considered fully representative of the wide range of patients in hospitals.
A methodological problem is that the venous jugular bulb blood represents an average of the whole brain hemisphere, which is much larger than the volume of the NIRS measurement it is being compared to [13]. In addition, there may be extra-cerebral contamination [14]. To determine the StO 2 the arterial-to-venous volume ratio (AVR) has to be assumed (typically 25:75 % or 30:70 %). Although a positron emission tomography (PET) study estimated AVR to be 30:70 [15], this is a source of errors, because the AVR was shown to vary considerably between individuals [16][17][18]. Furthermore, the AVR depends on the specific location measured and it changes over time [19], in particular during desaturation experiments [2]. The AVR is further influenced by e.g. the end tidal CO 2 , which must be kept constant [20] and the subject's positioning [2] which may cause systematic differences between studies. A change in the group AVR assumption by 10 % (i.e. 20:80 % or 40:60 %) causes a change in bias of already ±3 % [20]. But a recent meta-analysis on published cerebral StO 2 acquired with the INVOS oximeter reported that even higher AVR of up to 75:25 % fit their data best [21]. These examples show that already the in vivo reference StO 2 is associated with a number of assumptions and uncertainties on group and patient levels, is subject to considerable random and systematic errors, and does not constitute a 'gold standard'.
Apart from problems with the reference StO 2 , there are further problems encountered during in vivo validation by desaturation studies: Although short episodes of hypoxia with arterial oxygen haemoglobin saturation (SaO 2 ) as low as 50-70 % are tolerated well by healthy adults [22], the very low range of StO 2 cannot be validated safely with this method, leaving questions on the validity of extreme (high and low) readings which may occur in clinical situations.
Another issue is that changes in StO 2 originate from the brain and partly from extra-cerebral layers. In adults a significant influence of scalp and skin on the cerebral StO 2 values was found [23, 24] even though the effect of the skull layer, which is likely substantial [25,26], was neglected. However, StO 2 is changed everywhere and not only in the brain in desaturation studies. Thus, the ability of a NIRS oximeter to measure brain StO 2 independently of the more superficial tissues is not assessed by current methods.
A desaturation study comparing several cerebral oximeters found 'significantly more positive bias at lower SaO 2 ' in some oximeters [2], which corresponds to lower sensitivity to oxygenation changes. The difference between oximeters is striking, because most oximeters in that study were calibrated by this method of arterial and venous blood sampling. Hence, this approach results in inconsistent StO 2 values between different instruments, and therefore seems inappropriate for validation of cerebral StO 2 .
A further option for in vivo validation is the vascular occlusion test on extremities. The ischaemia induced by the occlusion allows a wide range of StO 2 , but otherwise the same mentioned issues as for the brain remain unresolved.
Phantoms, on the other hand, have the advantage of controllable optical properties and can be adjusted for specific research questions and may include sophisticated geometries. In 1993, Firbank et. al. recommended a solid phantom made of optically clear polyester resin. Dyes and titanium dioxide were added to adjust its optical properties [27]. 3D printers allow producing anatomically accurate, tissue-equivalent phantoms of infant heads [28] and mice [29]. However, since oximeters include a different set of wavelengths each, it is more appropriate to include real hemoglobin to obtain an accurate absorption spectrum. In addition, StO 2 and total haemoglobin concentration (c tHb ) can be changed over the entire relevant range [30-33]. One example is the "dynamic phantom brain model" [34]. This model consisted of (1) a resin with a vascular network (500 µm diameter) which represents brain and is perfused by human blood, (2) a bubble oxygenator to change the oxygenation of the blood and (3) a roller pump [34]. Another method is to fabricate microtubes for the vascular network [35]. The diameter of these microtubes, however, was still larger (300 µm inner diameter) than the one of human capillaries (9 µm (inner diameter)) [36] which is a relevant difference for NIRS oximetry [37].
The most promising method to compare NIRS devices in our opinion is a dynamic 2-layer phantom with adjustable optical properties mimicking the neonatal head as described in detail in [38]. This phantom contains Intralipid to adjust scattering and real human haemoglobin. The aim was to apply this phantom to quantitatively compare different NIRS devices for SafeBoosC to establish comparable intervention thresholds between devices. We provided mathematical equations to convert StO 2 between devices. We demonstrated that the systematic differences in the StO 2 values between NIRS oximeters also depend on the c tHb of the phantom, which may be due to different assumptions regarding background absorbers [39].
In our previous comparison studies in phantoms [38, [40][41][42] we observed substantial systematic differences between in vivo calibrated NIRS oximeters. Such differences were also shown in vivo [2-8] and are probably due to the error prone in vivo calibration rather than the simple nature of the phantom [20]. The oximeters included in [38] and here can be assumed to apply the multi-distance approach for calculation of StO 2 , since they provide more than one source-detector separation. Such oximeters are minimally influenced by a superficial layer of 2.5 mm [26,42] independent of epidermal pigmentation which leads to a lower intensity of detected light, thus lowering the SNR. Therefore, our phantom model with a static superficial layer is an appropriate model for neonates. We included adult sensors in our experiments since adult sensors have been used off-label in neonates. In adults the skull/scalp/skin region is > 1 cm thick and extra-cerebral signals contribute substantially to the StO 2 . Therefore, the results are not to be translated to adults. Although it is currently unclear how a good model of the adult head can be achieved best, we have shown that phantoms like ours have the capability to assess sensitivity of instruments to deeper layers [42]. Phantoms provide for simultaneous measurements by different NIRS oximeters on truly the same sample and over a wide StO 2 range. Even though not yet fully completed, it is possible to include a precise and accurate StO 2 reference for validation of accuracy. Thus, phantoms are a versatile tool to validate NIRS oximeters reproducibly.
In this paper, the aim was to extend our previous work [38] with a new set of devices and sensors to provide clinical researchers and clinicians a means to translate neonatal cerebral StO 2 acquired by their oximeter to the results reported by others, using other oximeters. The three additional oximeters NIRO-200NX (Hamamatsu), FORE-SIGHT Elite (Casmed) and SenSmart-X100 (Nonin) were compared with OxiplexTS (ISS) as a reference. Measurements for the INVOS 5100C (Medtronic) and OxyPrem v1.3 (in-house developed, University Hospital Zurich) oximeters were repeated with the aim to test reproducibility of the liquid phantom method.

NIRS oximeters
Some of the instruments used in [38] and this experiment were calibrated in vivo, which only probes a limited range of StO 2 safely [20, [43][44][45][46][47]. In neonates, however, StO 2 often lies and is considered outside this calibrated range. We thus report data from the whole range recorded. We solely describe oximeters and sensors tested for the first time with this method. For a description of previously tested devices please refer to [38].
The NIRO-200NX (Hamamatsu Photonics K.K., Hamamatsu, Japan) applies an LED source with three wavelengths (735 nm, 810 nm, 850 nm) in disposable adhesive (NIRO small/ NIRO large) and re-usable (NIRO small RU/NIRO large RU) sensors with source-detector separation (SDS) of 3 cm (small) and 4 cm (large). Detectors of the disposable sensors and the re-usable probes have different shapes. The device combines modified Beer-Lambert law and spatially resolved spectroscopy for trend and absolute measurements. It is approved for clinical use.
The FORE-SIGHT Elite (CAS Medical Systems, Inc., Branford, CT, USA) applies five wavelenghts (690, 730, 770, 810, and 870 nm) to measure absolute StO 2 . The large sensor (FORE-SIGHT adult) comprises SDS of 1.5 and 5 cm while the medium sensor (FORE-SIGHT medium) has SDS of 1.3 and 4 cm. Small sensors for neonates use SDS of 1 and 2.5 cm and are available as adhesive (FORE-SIGHT small) and non-adhesive version (FORE-SIGHT small band). The device is approved for clinical use.
The SenSmart-X100 (Nonin Medical, Inc., Plymouth, MN, USA) applies four wavelengths (730, 760, 810 and 870 nm). The sensors have two light sources and two detectors giving four light paths with SDS of 2 and 4 cm in case of the adult sensor EQUANOX Advance 8004CA (SenSmart adult). The device is approved for clinical use.
The device we used as reference here, OxiplexTS (ISS, Champaign, Illinois, USA), employs two modulated light sources (692 nm and 834 nm) measuring absolute tissue absorption and reduced scattering coefficients. A rigid sensor with four SDS (2.5 cm, 3.0 cm, 3.5 cm and 4.0 cm) was applied. In contrast to other tissue oximeters, this device is not approved for clinical use, but has a CE mark for research. As we solely intended to compare instruments and not to assess accuracy, our reference does not have to measure the true StO 2 , but only needs to measure reproducibly and independent of c tHb and reduced scattering coefficient (µ s ). OxiplexTS measures both absorption coefficient (µ a ) and µ s and was validated in several publications such as [26, 39,48] and in vivo a precision of 2.0 % was demonstrated in term-born newborns [49], thus fulfilling our needs.

Phantom setup
As in the previous systematic in vitro comparison of NIRS oximeters [38], the setup consisted of a container offering four windows for simultaneous recording with NIRS sensors. Each window was made of a layer of silicone with tailored optical properties and thickness (2.5 mm) to resemble the scalp and skull of a typical neonate. The container was filled with a liquid containing human haemoglobin (Hb) and Intralipid (IL) which was added to obtain the wanted µ s of ≈ 5.5 cm −1 . Each phantom preparation covered typical c tHb in neonates (see Table 1). OxiplexTS (ISS, Inc., Champaign, IL, USA) was applied as a reference oximeter. The investigation was performed in three phantoms on three different days.

Phantom no. 1
The FORE-SIGHT Elite with FORE-SIGHT small and FORE-SIGHT small band as well as NIRO small were investigated with one deoxygenation per liquid mixture. Then FORE-SIGHT small, FORE-SIGHT small band and NIRO small were removed, remounted and another three deoxygenations were performed. FORE-SIGHT small reported an 'out of range' error in this period and did not provide data.

Phantom no. 2
Two groups of oximeters were investigated: Group 1 with SenSmart adult, FORE-SIGHT adult and NIRO large; Group 2 with FORE-SIGHT medium, NIRO small RU and NIRO large RU. Groups 1 and 2 were intermittently placed on the phantom. Two deoxygenations at a c tHb = 30 µM were performed with group 1 before switching to group 2 and performing another one. Then c tHb was increased to 47.5 µM and one deoxygenation was first measured with group 2 followed by group 1. After increasing c tHb to 75 µM, there were first two deoxygenations recorded with group 1 and then one with group 2.

Phantom no. 3
Here, the INVOS 5100C with adult SomaSensor SAFB-SM (INVOS adult), OxyPrem v1.3 and SenSmart adult were employed. For c tHb = 30 µM and 75 µM one deoxygenation each was performed. At c tHb = 47.5 µM we performed several deoxygenations and removed and mounted the sensors in-between them to test robustness to repositioning. During the second c tHb = 47.5 µM deoxygenation new sensors for INVOS adult and SenSmart adult were used and afterwards the original ones were reapplied. The phantom consisted of the same ingredients as in [38], obtained from the same suppliers. Main ingredient was phosphate buffered saline (PBS) (Kreis, pH = 7.4) with a volume of 2.5 l to which 74 ml IL (20 %) and defined amounts of blood were added. We added sodium bicarbonate buffer (SBB) (8.4% ≡ 1 mmol/ml) initially (15 ml) and each time when adding blood to the phantom (10 ml). We added 3 g of yeast solved in a small amount of SBB and 3 ml glucose (50 %) solution to the phantom to trigger deoxygenation. Each time blood was added, another 3 ml of glucose were added. The phantom was re-oxygenated by adding pure oxygen (O 2 ) by bubbling. In [38], we covered the typical range of c tHb in neonates with c tHb = 25 µM, 45 µM and 70 µM. Here we adhered to these values. Table 1 shows the amounts of blood added and the effective c tHb of each phantom. Two erythrocyte concentrates with the same measured c tHb ≡ 215 g/l and htc ≡ 65.5 % were employed, one for phantom 1 and one for phantoms 2 and 3.

Data extraction and analysis
To ensure consistency, data processing methods were the same as in [38]. We applied the same StO 2 limits for fitting to both axes, i.e. to OxiplexTS and the oximeter to be investigated. However, here we set the upper limit to StO 2 = 85 % (instead of 94 %) because StO 2 of FORE-SIGHT Elite was non-linear above this value for all sensors. The lower limit was kept at StO 2 = 16 %.

Results
Figures 1(a)-1(d) and 2(a)-2(d) show StO 2 of the oximeters investigated for the first time, whereas Fig. 3(a) and 3(b) show the results of two oximeters already included in [38]. We added the data from [38] to the same plots for comparison. Table 2 shows the coefficients for linear fits in the range 16 ≤ StO 2 ≤ 85 % and their generally high correlation coefficients (R 2 ) except for SenSmart adult showing a remarkable curvature. We do not report results of FORE-SIGHT medium due to inconsistency with data obtained in another experiment (not published). We recorded pH and temperature (T) by repeated measurements in all three phantoms. Overall, pH was in the range 7.13 ≤ pH ≤ 7.64 with a general decrease over time and an increase during bubbling of O 2 . Values of temperature were in the range 35.1 • C ≤ T ≤ 36.4 • C with a variation of ≈ 0.5 • C within each single phantom. µ s as measured by OxiplexTS decreased by 11 % from baseline the end of phantom 1 (after 210 minutes), by 16 % at the end of phantom 2 (after 275 minutes) and by 9 % at the end of phantom 3 (after 160 minutes).

Discussion
This study is an extension to the study in [38] with the aim to provide clinical researchers and clinicians a means to translate neonatal cerebral StO 2 acquired by their oximeter to the results reported by others, using other oximeters. As we used the same methods, we do not repeat the general discussion about aspects of the phantom but focus on new findings.

Observations
Relations between all oximeters investigated and OxiplexTS were linear except for these exceptions: (1) Fig. 1(a), 1(b) and 1(c) reveal that the FORE-SIGHT Elite oximeter is less sensitive to oxygenation changes for StO 2 > 85 %. Here the relation is non-linear. This does not affect the coefficients because we set the fitting range 16 ≤ StO 2 ≤ 85 % to circumvent this problem. The slightly smaller fitting range compared to [38] does not affect the results because there was little noise and all relations were linear with high R 2 ( Table 2) except for SenSmart adult (see (3) below).
(2) In Fig. 2(b) the c tHb = 30 µM curve of NIRO small is bent at StO 2 > 85 %, which has no influence on the results. This coincides with a manual pH measurement. It is probable that the hand-held pH-probe accidentally partially obstructed the light path of the sensor.
(3) SenSmart adult to OxiplexTS relation is non-linear ( Fig. 1(d)). We have observed this already in a previous study [42]. Although the linear fits (Table 2) here depend on the StO 2 range of fitting, we consider the deviation from the fit lines acceptable within the fit range, but extrapolation to higher or lower StO 2 is inaccurate. Not all manufacturers published their algorithms and therefore it is unknown why instruments over-/underestimate StO 2 . One reason may be unaccounted absorbers such as water (H 2 O) present in the phantom, particularly during low c tHb [39]. The phantom contains 98 % H 2 O and is similar to brain tissue of neonates with up to 95 % H 2 O [50]. This is discussed in detail in [38].
We observed a decrease in phantom µ s over time. The µ s depends on the number and size of sub-micrometer lipid droplets of the IL emulsion and their relative refractive index. We assume the decrease in µ s to be caused by confluence of these droplets into larger droplets. The StO 2 errors due to the variation in µ s were small. In repeated deoxygenations, there is no relevant deviation visible ( Fig. 1(a), 1(c), 1(d), 2(b), 2(d), 3(a), and 3(b)) except for the small one due to sensor repositioning (< 2 %, see Sec. 4.3). We therefore conclude that the phantom experiments may last up to 4.5 hours, as long as the decrease in µ s is in the range of this experiment (≤ 16 %).
We cannot tell why an error was reported after removal and reattachment of the adhesive sensor (FORE-SIGHT small). The values of this sensor agreed well to FORE-SIGHT small band beforehand (see Table 2), as expected since both sensors apparently are similar.
We do not report results for FORE-SIGHT medium, because its response at c tHb = 75 µM compared to the response at the lower c tHb was too different from the patterns observed with the large and both small sensors of the same oximeter. FORE-SIGHT medium data from another, unpublished experiment also showed different patterns for all three c tHb . Simultaneous StO 2 readings of NIRO small RU agreed well with another unpublished experiment. Thus, we exclude phantom 2 as reason for this deviation. We also exclude inadequate sensor placement because the sensor was repositioned several times. Probably the specific sensor used was simply defective and consequently we do not report results for FORE-SIGHT medium.
The non-linear behavior for the SenSmart adult was also observed in a previous phantom study [42], but with remarkably stronger curvature in the present study ( Fig. 1(d)). In phantom 3 at c tHb = 47.5 µM, one measurement with a brand new SenSmart adult sensor agrees well with the other measurements ( Fig. 1(d)). We therefore exclude a sensor defect. SenSmart adult was used in two different phantoms and the sensor was repositioned several times, hence we exclude inadequate sensor placement. Simultaneous readings of other oximeters in both phantoms were inconspicuous. We consequently exclude the phantoms 2 and 3 as reasons causing this observation. One difference between the present study and [42] are the different optical properties for the windows. Both sets of windows were made from the same ingredients, but different quantities. At 692 and 834 nm, respectively, the windows of current study and [38] had µ a of 0.059 and 0.057 [1/cm], and µ s of 5.0 and 4.4 [1/cm], while the windows of [42] had µ a of 0.10 and 0.11 [1/cm], and µ s of 9.6 and 8.3 [1/cm]. Accordingly, we speculate that in contrast to other oximeters, SenSmart adult may be sensitive to optical properties of the superficial layer, although it is only 2.5 mm thick. We decided to report results of SenSmart adult for comparison to [42], but this issue needs to be further investigated.   Table 4).

Reproducibility of the method
In our study, sensors were un-mounted and re-mounted between repeated measurements at the same c tHb (Fig. 1(a), 2(b), 1(d), 3(a), and 3(b)). The sensor repositioning causes slight variation in sensor location and pressure applied to the window. Repositioning errors were < 2 % StO 2 , which we consider acceptable. For INVOS adult and SenSmart adult we attached a brand-new sensor for one deoxygenation at c tHb = 47.5 µM in phantom 3. Both showed exactly the same behavior as the old sensors ( Fig. 1(d) and 3(a)), suggesting negligible variation between individual sensors.
INVOS adult and OxyPrem v1.3 oximeters were investigated in [38] and in the present study ( Fig. 3(a) and 3(b)) enabling to determine the repeatability. Within the fitting range, differences were < 3 %. Table 4 shows that linear transformation from present results (in rows) to those of [38] (in columns) is y = 0.96 * x + 3.2 for INVOS adult and y = 0.96 * x + 2.6 for OxyPrem v1.3 and thus close to the perfect relationship of y = 1 * x + 0. Table 3 further shows that uncertainty range due to changes in c tHb is slightly increased compared to [38] for OxyPrem v1.3 at the hypoxic threshold and for INVOS adult at the hyperoxic threshold. This increase is most likely caused by the sensor re-mounting between the deoxygenations at c tHb = 47.5 µM which resulted in a slight shift of values ( Fig. 3(b)). Compared to [38], estimation of SafeBoosC action thresholds only deviates by 1 % at the hypoxic threshold for INVOS adult and by 1 % at hyperoxic threshold for OxyPrem v1.3 . We conclude that the phantom showed good reproducibility.

Implications of other haemoglobin species
The tested oximeters do not account for dyshaemoglobins, e.g. carboxy-and methaemoglobin, or fetal Hb , so their presence could be a source of error depending on their concentration and extinction coefficients [31]. We used adult donor blood for the phantom and hence fetal Hb was not present. Fetal Hb has nearly the same absorption spectra as adult Hb [52]. So, even in neonatal populations when fetal Hb may be dominant, this would at most introduce a negligible error [53]. Methaemoglobin has extinction coefficients in the near-infrared range that are close to those of oxygenated haemoglobin (O 2 Hb), while the carboxyhaemoglobin extinction coefficient is negligible [52]. Normally, methaemoglobin is < 2 % of all Hb. Methaemoglobin was not determined in the donor blood, but only increases slightly during storage [54]. No known agents for the formation of methaemoglobin were used during the experiment and hence, we do not consider methaemoglobin a likely source of error.

Conclusion
Extending our previous findings to further oximeter brands, we confirmed that different brands of oximeters provide different tissue oxygen haemoglobin saturation (StO 2 ) readings in a phantom designed to mimic the head of preterm infants. Preliminary clinical data are in agreement with the phantom results. We provided linear equations which translate data from one oximeter to another at a total haemoglobin concentration (c tHb ) typical for preterm infants. Accordingly, intervention thresholds have to be set specifically for each brand and sensor type. Additionally, these thresholds are subject to uncertainties arising from device-specific dependence of the StO 2 readings on the c tHb . Previous results were reproduced with < 3 % deviation. Reproducibility of hypoxic and hyperoxic thresholds was ≈ 1 %, and variation due to sensor removal and replacement was < 2 %. This demonstrates good reproducibility of the method.