Spatially offset Raman spectroscopy for in vivo bone strength prediction

: Bone strength is a worldwide health concern. Although multiple techniques have been developed to evaluate bone quality, there are still gaps to be filled. Here we report a non-invasive approach for the prediction of bone strength in vivo using spatially offset Raman spectroscopy. Raman spectra were acquired transcutaneously from the tibiae of mice from 4 to 23 weeks old and subsequently on the exposed bones. Partial least squares regression was applied to generate predictions of the areal bone mineral density (aBMD), volumetric bone mineralization density (vBMD), and maximum torque (MT) of each tibia as quantified by dual-energy X-ray absorptiometry, microCT imaging, and biomechanical tests, respectively. Significant correlations were observed between Raman spectral predictions and the reference values in all three categories. To our knowledge, this is the first demonstration of Raman spectroscopy predicting a biomechanical bone parameter (MT) in vivo with an uncertainty much smaller than the spread in the reference values.


Introduction
Osteoporosis is a serious disease which affects more than 2 million people in the US [1]. Without appropriate diagnosis and treatment, it usually leads to bone fragility and fracture and the development of osteoporosis can cause further morbidity and even mortality. Bone density testing by dual-energy X-ray absorptiometry (DXA) is the current clinical standard for diagnosis. DXA measures areal bone mineral density (aBMD) and can reveal low bone mass and thus increased risk of fracture [2]. Despite the accuracy and convenience, aBMD measured by DXA can be a poor indicator in some circumstances, e.g. in measurements on early postmenopausal women [3], and requires the use of ionizing radiation. Similarly, X-ray computed tomography (CT) has been developed to scan bones and generate high resolution images of volumetric bone mineralization density (vBMD) [4]. However, CT requires a radiation dose around 100 times higher than DXA [5]. Magnetic resonance imaging (MRI), another non-invasive method commonly used to image the musculoskeletal system, has also been demonstrated to measure the density of bones [6]. MRI is not frequently applied on bones, however, due to its high cost. Recently, a minimally invasive diagnostic technique named reference point indentation (RPI) was also developed for the assessment of bone mechanical quality in vivo [7].
In addition to bone density and mechanical properties, chemical composition is a fundamental parameter related to bone quality. As a traditional non-invasive and label-free method for chemical characterization, Raman spectroscopy (RS) has shown its feasibility in measuring the mineral and organic matrix components in ex vivo bone studies [8][9][10]. Previous studies have shown RS can be used to measure perturbations to animal bone biochemistry due to disease, age [8,11], or exposure to heavy metals [12,13] or the effects of aging on human bone [14]. Significant correlations between Raman and biomechanical properties of bones have also been observed in these studies [8,11,13] with quantitative metrics suggesting that RS provides more accurate predictions of bone strength than the clinical parameter of aBMD.
In performing in vivo Raman spectroscopy on bones, one major concern is the confounding spectrum of the overlying soft tissue. Because this tissue contains some of the same chemical constituents as bone-most notably Type I collagen-it is desirable to reduce its contribution to the measured spectrum. Near-infrared confocal Raman microscopy is generally not suitable for transcutaneous bone characterization due to the limited penetration depth of approximately 500 microns in tissue [15]. Traditional wide-field illumination/collection from a surface spot several millimeters in diameter allows deeper regions to be probed, but their signal percentage is low compared to shallower regions. Spatially offset Raman spectroscopy (SORS), in contrast, introduces an offset between the illumination and collection regions to increase the percentage of signal from deeper regions, with the tradeoff of lower overall signal strength [16]. Previous studies have used SORS to detect bone beneath soft tissue [17][18][19][20][21]. Schulmerich et al. [17] gathered SORS measurements on 32 live mice and their exposed bones. They successfully detected bone spectra beneath soft tissue, though the performance was affected by system alignment and animal coat color. Buckley et al. [21] demonstrated spectral differences, albeit statistically underpowered, between 6 healthy human and 10 patients with osteoporosis, using in vivo SORS. Our group has also used SORS to detect disease-related differences in murine bones ex vivo [8]. To subtract off the remaining contribution from the soft tissue on top of bones, we also developed a simultaneous, overconstrained, library-based decomposition (SOLD) fitting method to separate the signal of bones from transcutaneously-acquired Raman spectra [18]. Furthermore, by using SORS to measure subcortical bone tissue from a mouse model of genetic osteogenesis imperfecta, we found that SORS was more sensitive to these specific disease-related biochemical changes than conventional Raman spectroscopy [20].
Here, we demonstrate that transcutaneous SORS measurements on live mice enable the prediction of three standard bone-quality metrics. We applied SORS on the right hind tibiae of mice and conducted DXA, microCT, and torsion testing subsequently to measure bone mineral density (aBMD), volumetric bone mineralization density (vBMD), and maximum torque (MT), respectively. Chemometric models were then constructed to predict aBMD, vBMD, and MT. The results indicate for the first time that SORS spectra acquired transcutaneously and in vivo can accurately predict bone mineralization and strength.

Specimens
The mice used in this study ranged in age from 4 to 23 weeks and came from two different strains. Four male and four female C57BL/6J (black) mice with ages of 4 weeks and 23 weeks; four male and four female B6(Cg)-Tyrc-2J/J (Albino) mice with ages of 4 weeks, 8 weeks and 12 weeks; which is in total 40 mice, were measured transcutaneously on the right hind tibia with a SORS instrument (see Fig. 1). Following the transcutaneous measurements, mice were sacrificed and tibiae were extracted and evaluated by Raman spectroscopy. The exposed bone specimens were stored in phosphate buffered saline and frozen for less than 24 hours before the subsequent microCT, DXA, and biomechanical testing.

Raman spectroscopy
All the Raman data in this study were obtained with a RS system that has been described previously [8,20]. Briefly, the sample was excited with an infrared laser with 830-nm wavelength and 150-mW power in a spot with 230-µm diameter. The Raman scattered light was collected with a numerical aperture (NA) of 0.34 and imaged onto a circular bundle of 61 multimode fibers (0.27 NA, 100/120 µm core/cladding diameters), with the center fiber coregistered with the excitation spot on the sample. Light delivery and collection are both performed in a non-contact manner, with a consistent standoff distance of about 2 cm set by the focal length of the imaging/collecting lens. The other end of the bundle rearranged the fibers linearly at the entrance to a f/1.8 imaging spectrograph (Kaiser HoloSpec), and a charge coupled device (CCD) array enabled acquisition of 40 separate fibers' spectra (limited by the array height). For the measurements of exposed bone, the laser illumination spot was defocused to a spot 1 mm in diameter overlapping the image of the entire collection fiber bundle to provide a wide field measurement. For each location on the mouse leg, spectra were acquired for five 60-second exposures and averaged.
Fiber spectra were processed as described previously [6], including cosmic ray removal, readout and dark current subtraction, and image aberration correction. For transcutaneous measurements, the 40 spectra were converted into three spectral averages, corresponding to the central region of the circular end (henceforth "Ring 1") and two concentric rings around it ("Rings 2 and 3"), as illustrated in Fig. 1. These three average spectra corresponded to spatial offsets between illumination and collection from s = 0 to approximately s = 0.5 mm. For the wide-field exposed bone measurements, a single average was computed using all 40 fibers' spectra. Five measurements along the midshaft of the tibia at 1 mm intervals were averaged to produce a single spectrum per tibia. A continuous wavelet transform background-correction algorithm subtracted a fluorescence estimate from each raw spectrum [22]. The spectra were then smoothed with a Savitzky-Golay filter [23] over a 3 pixel window to match the spectral resolution of the system (approximately 5 cm −1 ).

Dual-energy X-ray absorptiometry
aBMD was measured on the right tibia using a PIXImus dual-energy X-ray absorptiometer (GE-Lunar), as described elsewhere [24]. Briefly, the equipment was calibrated daily on a mouse phantom provided by the manufacturer. X-ray absorptiometry data were collected and processed with provided software (Lunar PIXImus 2, Version 2.1, GE-Lunar).

Microcomputed tomography
The procedures for measuring vBMD using microCT have been described previously [8]. Briefly, a Scanco VivaCT 40 (Scanco Medical AG, Bassersdorf, Switzerland) with 0.5 µm isotropic resolution was used to measure vBMD in the proximal half of the tibia. The value was calculated from three-dimensional reconstructions of microCT scans generated by complete image solution software (Scanco, Medical, Basserdorf, Switzerland).

Biomechanical testing
Following DXA, microCT, and Raman measurements, tibiae were evaluated in torsion via a biomechanical torsion test described previously [8]. Briefly, tibiae were rehydrated and mounted on an EnduraTec TestBenchTM system (200 N·mm torque cell; Bose Corporation, Minnetonka, Minnesota), with 4 mm of bone length exposed between the contact points. Bones were gradually loaded in torsion at a rate of 1 °/s until fracture, and the maximum torque (MT) applied prior to fracture was recorded. Due to fragility, two bones from 4-weekold mice and one from an 8-week-old were broken prior to torsion testing (but after DXA and microCT testing), and thus only 37 MT values were obtained.
Torsional testing, the method chosen in this study, explores one mode of loading that commonly leads to long bone fractures in humans. Bending, another common fracture mode, is explored by other tests such as three-point bending, e.g. in a model of skeletal fragility [25].

Data analysis
Linear models to predict biomechanical properties from Raman spectral data were built using standard partial least squares regression (PLSR) [23] as described previously [8]. A leave one out cross validation (LOOCV) approach was used to generate Raman-based prediction of aBMD for each tibia via PLSR, and the root mean squared error of cross validation (RMSECV) was calculated. In each iteration of the cross-validation, the rank of the PLSR model was selected based on the method published by Haaland and Thomas [26] and required to be smaller than one third of the sample size, in order to prevent overfitting the calibration data [27]. The same processing was done for prediction of vBMD.
To guard against flaws in the biomechanical measurements, MT values more than three median absolute deviations away from the median of their age cohort were excluded as outliers. This criterion rejected two MT measurements, leaving 35 mice for analysis. Two different PLS approaches were used. In the first approach, the entire set of mice was analyzed as a single group, just as for aBMD and vBMD prediction. In the second approach, the spectra from 23-week-old mice were placed in a separate group from the younger ones, and a separate LOOCV PLSR model was developed for each group. The reasoning for this, related to chemical effects at different ages, will be explained in the subsequent section. Figure 2 shows representative in vivo Raman spectra acquired over the tibia of a mouse. The three spectra were acquired from different rings (see Fig. 1(a)). The corresponding peaks in all three spectra have similar signal to noise, as the larger number of fibers in ring 3 compensates for the larger per-fiber signal strength in ring 1. Prominent tissue and bone peaks can be found in all spectra, such as CH 2 (1450 cm −1 ), Amide III (1243-1320 cm −1 ), hydroxyproline (876 cm −1 ), phosphate (959 cm −1 ) and carbonate (1070 cm −1 ) [28]. The relative strength of the mineral peaks increases as the offset increases, as expected from the increased relative contribution from greater depths. The higher bone-to-soft-tissue ratio in ring 3 would be expected to be the most valuable for analysis of bone strength. Most of the subsequent analysis was therefore performed using the spectra from ring 3, although a comparison with ring 1 results will also be discussed. Fig. 2. Transcutaneous Raman spectra acquired over one mouse tibia with different sourcedetector offsets, from 0 (Offset 1) to 0.5 mm (Offset 3), based upon the rings defined in Fig.  1(a). Spectra are normalized to the area under the peak at 1660 cm −1 . The relative increase in the phosphate peak from Offset 1 to Offset 3 indicates the stronger relative contribution of bone in Offset 3. Fig.  3 that the aBMD keeps increasing across the entire range of ages, whereas the MT value in the 23-week group does not increase. This MT trend is consistent with literature reports [29]. Figure 3(c) shows that the mean carbonate to phosphate band-area ratio in the Raman spectra (normalized to 1 at week 4) increased significantly in the 23-week-old mice (p<0.01 in t tests between 23-week group and any other group) while the groups with ages from 4-week to 16-week are comparable to each other. There are different chemical trends that affect MT in this mouse population as a function of age. As juvenile mice grow from 4 to 20 weeks old, the total mineral content in the tibiae increases, which tends to elevate MT. Once the mice are mature, however, this increase halts, but carbonate substitution (for phosphate) is now observed, which tends to reduce MT. This supports the suggestion made in the Methods section above, that age-range-specific PLSR models might work better in predicting MT due to the different age-related chemical trends.  Figure 4 shows predictions of aBMD stemming from regressions of transcutaneous (a) and exposed bone (b) Raman spectra against reference values from DXA. Both plots show significant correlation between the measured and predicted aBMD (p<0.0001 for both, Pearson's test [30]). The RMSECV of the transcutaneous predictions is about 17% larger than that of exposed bone (0.007 versus 0.006 g/cm 2 ). The accuracy of each prediction is quantified in Table 1. The most common PLS model ranks selected during the LOOCV process are included in Table 2. Correlation between predicted aBMD based on PLSR of Raman spectra measured on exposed bone and that of the reference obtained by DXA. Figure 5 shows corresponding scatter plots for Raman-based prediction of vBMD versus reference values from micro-CT. Again, both plots show significant correlation between the measured and predicted vBMD (p<0.0001). The RMSECV generated from transcutaneous measurement is 20% larger than that of exposed bone (35.7 versus 29.7 mg HA/ccm). The accuracy of each prediction is quantified in Table 1. The typical ranks used in PLSR models are included in Table 2. Correlation between predicted vBMD based on PLSR of Raman spectra measured on exposed bone and that of the reference obtained by microCT. Figure 6 shows the corresponding scatter plots for Raman predictions of MT versus reference values from the torsion test. The transcutaneous and exposed-bone plots both show significant correlation between the measured and predicted MT (p<0.0001). The RMSECV generated from transcutaneous measurement is 18% smaller than that of exposed bone (2.7 versus 3.3 N·mm). These results are summarized in Table 1. The typical ranks used in PLSR models are included in Table 2. Within Fig. 6, we noticed that the MT predictions for the 23week-old subgroup showed no significant correlation with reference values. As noted above, this was anticipated based upon chemical effects that only appeared in this age range. When a separate leave-one-out regression was performed that excluded the 23-week-old mice, the RMSECV was lower (2.2 versus 2.7 N·mm). Correlation between predicted MT based on PLSR of Raman spectra measured on exposed bone and that of the reference obtained by bone torsion test. Like Raman spectroscopy, microCT also provides a vector of numbers related to the region of inspection. For comparison with Raman spectroscopy, a separate regression was performed to predict MT using not Raman spectra but instead four parameters derived from the microCT measurement (bone area, tissue area, vBMD, and cortical thickness) [8]. The scatter plot is shown in Fig. 7. The RMSECV is 1.5 N·mm (r 2 = 0.89, p<0.0001), as compared to 2.5 for the transcutaneous Raman spectra. However, different from the Raman case, regression excluding the 23-week group to predict younger mice led to worse performance (RMSECV = 1.8 N·mm). To highlight the influence of the spatial offset, we compared predictions when Raman spectra from ring 1 were used instead of those from ring 3. Figure 8 shows the results. The two rings predicted aBMD values equivalently (r 2 = 0.50 for ring 1 versus 0.53 for ring 3), but the correlation with measured MT values was much worse for ring 1 (0.19 versus 0.53). Fig. 8. (a) Correlation between predicted aBMD based on PLSR of Raman spectra from ring 1 (marked by red triangles) compared with ring 3 (marked with blue circles) measured transcutaneously on live mice and that of the reference obtained by DXA. Correlation coefficients were similar for the two offsets. (b) Correlation between predicted MT based on PLSR of Raman spectra from ring 1 (marked by red triangles) compared with ring 3 (marked with blue circles) measured transcutaneously on live mice and that of the reference obtained by torsion test. In this case, the correlation was noticeably stronger for the larger offset (r 2 = 0.53 versus 0.19).

Figures 3(a) and (b) depict the trends of aBMD and MT with age. It can be seen from
The goal of this study was to investigate the feasibility of using SORS in vivo for prediction of bone quality. Our results have demonstrated that although the Raman spectra acquired transcutaneously from tibiae in live mice contain signal from overlying soft tissue, the Raman-predicted bone quality metrics of aBMD, vBMD, and MT show strong correlation with DXA, microCT and torsion test measured values. Moreover, in vivo transcutaneous measurements show comparable performance (less than 20% difference in RMSECV for all three parameters) to measurements of exposed bones ex vivo. The results indicate the applicability of SORS for in vivo bone assessment.
In addition, Raman spectroscopy has an advantage over X-ray methods as a noninvasive estimate of bone strength because it is sensitive to chemicals other than minerals. As Fig. 3 indicates, mineral density increased steadily with age across all mice, but the increase of maximum torque halted between 12 and 23 weeks of age. The relationship of maximum torque to bone mineral density therefore has a complicated dependence that changes with age. As Fig. 7 shows, when four micro-CT parameters are used, regressions to predict MT are better than when Raman spectroscopy alone is used (RMSECV of 1.5 vs. 2.5 N·mm). Since Raman spectroscopy can sense different types of minerals (phosphate versus carbonate) as well as Type I collagen, Raman combined with CT may add extra information about bone quality and could potentially produce the most accurate predictions.
The comparison of spatial offsets in Fig. 8 showed that the greater offset of Ring 3 led to more accurate predictions of MT, but no improvement in aBMD. This discrepancy is probably because the overlying soft tissue's Raman spectrum is dominated by Type I collagen and contains no phosphate or carbonate peaks. As such, the reduced soft tissue contribution at larger offset should have no influence upon the estimation of aBMD, which is a mineralspecific parameter. Maximum torque, however, depends partly upon the ratio of mineral to matrix in the bone itself. The collagen signal from soft tissue affects the measured ratio of mineral to Type I collagen and thus obscures the true ratio in the bone alone, making MT prediction more difficult. Reducing the soft tissue's collagen signature in the Ring 3 spectra reduced this obscuration, presumably leading to the more accurate predictions that were observed at this greater offset. Although Ring 3 maximized the signal contribution from bones, spectra from Ring 3 are still a composite of all layers and thus the influence of soft tissue is not eliminated. If spectral contributions from soft tissue were further suppressed by post-processing methods, such as band-targeted entropy minimization [31] or simultaneous overconstrained library-based decomposition [18], potentially the accuracy could be increased further.
Changes in the chemometric modeling might also improve the accuracy of our approach. This is especially true for MT, because it has a more complex dependence upon chemical composition than aBMD and vBMD. As noted above, in this study we explored separate regressions to predict MT for different age groups (4-12 weeks and 23 weeks) because of the changing relationship between MT and relative carbonate concentration at different ages (see Fig. 3). We expected that linear regression methods such as PLS might have trouble building a robust model for direct prediction of MT under those conditions. Indeed, regression using one PLSR model for all the mice resulted in a higher RMSECV (2.7 instead of 2.2 N·mm). As a result of having to split the mice into two groups, the regression for the 23-week group suffered from the small sample size (eight total mice), leading to uncorrelated predictions within this group. While this could be mitigated by obtaining more data from older mice, a nonlinear regression method might help to generate a more accurate single prediction model in the future.
It is also possible that uncertainties in the reference MT measurement (i.e. the torsion test) are a limiting factor in the Raman prediction accuracy. Although the tibiae were cemented with the same length exposed, the shapes of the tibiae from mice of different ages were different, e.g. the tibiae from older mice were straighter than younger ones. This may influence the consistency of testing among all groups. In future work, the accuracy and reproducibility of the torsion test could be explored using synthetically engineered samples whose properties are nominally identical.
This study is the first to demonstrate the ability of transcutaneous in vivo Raman spectroscopy to predict anatomical and biomechanical bone properties in mice as measured by microCT, DXA, and torsion tests. The data presented here examined mice from a range of ages. Interesting future directions would be applying SORS to monitor bone change during osteoporosis development longitudinally and bone recovery after clinical treatment.

Conclusions
To summarize, this study suggests that Raman spectroscopy can be used to non-invasively and accurately predict the strength of mouse bones in vivo. To our knowledge, this is the first report of Raman spectroscopy producing estimates of maximum torque in vivo that correlate significantly with reference measurements. The data in this study support the potential of Raman spectroscopy for the investigation of bone quality.