Tissue optical properties combined with machine learning enables estimation of articular cartilage composition and functional integrity

: Absorption and reduced scattering coeﬃcients ( µ a , µ (cid:48) s ) of biological tissues have shown signiﬁcant potential in biomedical applications. Thus, they are eﬀective parameters for the characterization of tissue integrity and provide vital information on the health of biological tissues. This study investigates the potential of optical properties ( µ a , µ (cid:48) s ) for estimating articular cartilage composition and biomechanical properties using multivariate and machine learning techniques. The results suggest that µ a could optimally estimate cartilage proteoglycan content in the superﬁcial zone, in addition to its equilibrium modulus. While µ (cid:48) s could eﬀectively estimate the proteoglycan content of the middle and deep zones in addition to the instantaneous and dynamic moduli of articular cartilage.

Thus, complementary approaches that can augment traditional arthroscopy for quantitative and objective tissue characterization in real-time during arthroscopic surgery would be of great significance, particularly for assessment of tissue integrity [9,10].
Optical spectroscopic methods, including mid-infrared (MIR), near-infrared (NIR), and Raman spectroscopies, are promising methods for quantitative evaluation of the integrity of biological tissues, such as articular cartilage [11,12]. Rieppo et al. [13] and Olumegbon et al. [14] have extensively reviewed the potential and current state of these methods for evaluating the composition, structure, and functional integrity of articular cartilage. In general, these vibrational spectroscopic methods are fundamentally based on light absorption (NIR and MIR spectroscopy) and scattering (Raman spectroscopy) phenomena [4]. Like other biological tissues, the fundamental molecular constituents of articular cartilage contain C-H, O-H, N-H, and S-H bonds, which are responsible for the absorption or scattering signature of biological tissues in the visible/infrared spectral range [15]. Thus, these optical methods are an excellent tool for improving the accuracy of arthroscopy by providing molecular-level information for discriminating between healthy and diseased tissues.
Although these optical methods are effective estimators of the integrity of connective tissues, such as articular cartilage [16][17][18], they are exclusively based either on the absorption or scattering of light and incapable of explicitly distinguishing the impact of various substances on light-tissue interaction. In contrast, absorption and reduced scattering coefficients (µ a and µ s ) of biological tissues in the NIR spectral range (800-2500 nm), obtained via experimental methods, such as diffuse optical spectroscopy, incorporates the intensity of reflected and transmitted light to describe the amount of light that has been absorbed and scattered by tissue constituents [19]. Light in specific NIR spectral ranges penetrate deep (1-10 mm) into soft tissues due to the high forward scattering and low photon absorption [15]. As a result, optical properties in the NIR spectral range have been suggested to be promising parameters for diagnostic assessment of several tissues, such as brain, breast, skin, muscle, and finger joints [20]. Since different constituents of biological tissues affect light absorption and scattering to varying degrees, tissue optical properties could provide effective means of assessing molecular alteration of specific components of biological tissues. Thus, changes in tissue optical properties could provide critical diagnostic information on component-specific alteration in tissue composition, structure, and functional integrity. This study is based on the hypothesis that optical properties are effective estimators of cartilage composition and functional integrity. To test this hypothesis, we investigate the potential of optical properties for estimating cartilage proteoglycan content and biomechanical properties using multivariate and machine learning algorithms. Like partial least squares (PLS), a common chemometric technique, machine learning methods, such as support vector machines regression (SVR) and random forests (RF) have demonstrated significant potential for analysis of high-dimensional data, such as spectroscopic data [21,22].

Sample preparation
In this study, intact bovine knee joints (n=15) were obtained from a local abattoir within one week of slaughter. Prior to sample harvesting, joints were kept intact in a vacuumed bag and stored at 4 • C to avoid degradation. Osteochondral (cartilage-on-bone) samples (20 × 10 × 10 mm 3 , n=56) were extracted from the upper and lower sections of the lateral and medial facets of the patellae. Since the physiological properties of articular cartilage are site-dependent and vary across the patella, each sample was divided into two specimens: A and B (10 × 10 × 10 mm 3 ) to minimize the variation between the location where the optical and biomechanical measurements and histological assessment were performed on each sample. Cartilage thickness of both specimens was estimated as the mean value of the thickness measured from all sides of the specimen from the articular surface to the interface of the cartilage and subchondral bone, using an optical microscope (Zeiss, STEMI, SV8, Germany) [23]. Immediately after thickness measurement, specimen B was stored at −20 • C in phosphate-buffered saline (PBS, pH 7.4) for future biomechanical testing and histological assessment, while specimen A was processed for measurement of tissue optical properties.
The underlying subchondral bone of specimen A was removed using a bandsaw, and the remaining bone was filed using sandpaper (Mirox P80, Mikra Oy, Uusikaarlepyy, Finland) until it was visually negligible. The resulting cartilage layer was then sandwiched between a glass (thickness=1 mm, Menzel-Gläser Frosted Microscope Slides, ThermoFisher Scientific Oy, Finland) and a coverslip (thickness=0.13 mm, Menzel Microscope Coverslips, ThermoFisher Scientific Oy, Finland), for optical measurement. To minimize sample dehydration, the sandwiched samples were stored for 12 hours in a humid box at 4 • C until required for measurement. During patella harvesting and sample preparation, the surface of articular cartilage was consistently kept moist with PBS supplemented with protease inhibitors [24].

Optical measurement and estimation of cartilage optical properties
Prior to optical measurement, the sandwiched cartilage samples were kept at room temperature for 15 minutes. They were then subjected to reflectance and transmittance measurement in the spectral range of 600 to 2500 nm at a step size of 5 nm using a spectrophotometer with an inbuilt integrating sphere (Lambda 1050, PerkinElmer Inc., Waltham, USA). Briefly, the spectrophotometer is a research-grade apparatus based on dispersive spectroscopy that captures the reflectance and transmittance of materials per wavelength. The instrument incorporates a 150-mm snap-in single integrating sphere with a transmittance port diameter of 20 mm and a reflectance port diameter of 25 mm. The integrating sphere technology is a standard method for measuring the reflectance and transmittance properties of materials. Furthermore, the Lambda 1050 features deuterium and tungsten halogen light sources and a photomultiplier tube (PMT, R6872) detector that spans the ultraviolet and visible spectral ranges (resolution≤0.05 nm), and a combination of Peltier cooled InGaAs and temperature stabilized PbS detectors that span the NIR spectral range (resolution≤0.05 nm). The latter combined detectors provide 30x improvement in signal-to-noise ratio over the spectral range of 900-2500 nm in comparison to the thermostated PbS detectors. Additionally, the change of detectors in the instrument happens at the wavelength of 860 nm [25]. Also, it is worth noting that all optical measurements were performed in an isolated dark room.
Using the transmittance and reflectance measurements, sample thickness, and refractive index obtained from the literature [26], µ a and µ s of articular cartilage were estimated using the inverse adding-doubling (IAD) technique [27]. The IAD method is an iterative process of estimating the optical properties that give rise to the measured reflectance and transmittance. The method consists of the following steps: 1) Generate a set of initial optical properties based on a crude fit of the measured reflectance and transmittance. 2) Solve the radiative transport equation [28] for the particular set of optical properties to estimate the reflectance and transmittance using the adding-doubling technique [29]. 3) Compare the predicted values with the measured data according to the metric M consisting of the relative errors (Eq. (1)): where R meas and T meas are the measured reflectance and transmittance, while R calc and T calc are the estimated reflectance and transmittance, respectively. 4) Iterate until a match is made by a minimization algorithm based on the downhill simplex method of Nelder and Mead [30]. Moreover, the 10 −6 term in the denominator of Eq. (1) prevents the error from becoming too large when the measured reflectance and transmittance are particularly low. In addition, Prahl et al. [27] estimated the accuracy of the IAD method with materials having known reflectance and transmittance. In their study, the absorption and reduced scattering coefficients were estimated as a function of reflectance and transmittance. The accuracy of the IAD method was evaluated by perturbing the reflectance and transmittance and calculating the absolute and relative errors in the estimated optical properties. According to their study, the maximum relative errors for estimating the absorption and reduced scattering coefficients are ≤ 10% and ≤ 6%, respectively. Furthermore, in the present study, the penetration depth (PD, mm) of photons in the full NIR spectral range was determined using Eq. (2) to estimate how deep NIR photons penetrate in articular cartilage: Due to inherent detector noise and high water content in articular cartilage (approximately 80%, [21]), the spectral range of 2350 to 2500 nm was excluded in this study.

Biomechanical testing
To assess cartilage functional integrity, the B specimens were thawed to room temperature and subjected to biomechanical testing to determine the samples' instantaneous (E inst ), equilibrium (E equ ), and dynamic (E dyn , at 1.0 Hz) moduli using a Mach-1 micromechanical testing system (V500css, Biomomentum Inc., Laval, Canada). The apparatus is equipped with a cylindrical plane-ended indenter (diameter = 1 mm) and a multi-axis force/torque transducer (Nano17, ATI Industrial Automation Inc., Apex, USA). The biomechanical testing protocol included a series of indentation-based stress-relaxation and sinusoid loadings. Prior to measurement, the bone-end of the samples was filed to be parallel with the articular surface and then glued to the measurement chamber, which was filled with PBS. A pre-stress of 12.5 kPa and 5 cycles of pre-conditioning loadings with a strain amplitude of 2% were applied to ensure adequate cartilage-indenter contact prior to measurement. The stress-relaxation protocol consisted of 3 steps with strain amplitude and rate of 5% and 100%/s, respectively, followed by a 15-minute relaxation period between the steps. Ten cycles of sinusoidal loading were performed immediately after the stress-relaxation measurement with a strain amplitude of 2% at 1.0 Hz [31,32]. The sampling frequency for these measurements was set to 100 Hz.
The instantaneous modulus was estimated as the ratio of the peak stress at the 2 nd step and the strain amplitude. The equilibrium modulus was estimated from the slope of the linear fit to the 3 equilibrium points. Notably, the equilibrium stress was estimated as the mean value of the stresses during the last 5 seconds of each step of the stress-relaxation measurement. The dynamic modulus was estimated as the ratio of stress and strain amplitudes of sinusoid loading. Cartilage was assumed to behave as an elastic and isotropic material during instantaneous and dynamic loadings and at equilibrium; thus, these parameters were calculated using the Hayes' model [33]. A Poisson's ratio of 0.5, 0.1, and 0.5 was assumed for the instantaneous, equilibrium, and dynamic moduli, respectively [34].

Histological assessment
Following biomechanical testing, the B specimens were subjected to histological assessment for quantification of cartilage PG content. To achieve this, the samples were fixed and decalcified in a solution containing formaldehyde (4%, Merck, Darmstadt, Germany) and ethylenediaminetetraacetic acid (EDTA, 10%, Merck, Darmstadt, Germany) at room temperature for 21 days. Following decalcification, the samples were processed and embedded in paraffin for sectioning [35]. 3-µm thick histological sections were cut perpendicular to the articular surface and stained with Safranin O [35]; Safranin O is a cationic dye that stoichiometrically binds to the negative ion content of glycosaminoglycan chains of PG macromolecules in cartilage [35]. The grayscale images of the stained sections were acquired using a conventional light microscope (Nikon Microphot FXA, Nikon Co., Tokyo, Japan) and a camera equipped with a charge-coupled-device (CCD) image sensor (Hamamatsu ORCA-ER, Hamamatsu Photonics, Hamamatsu, Japan). Subsequently, the optical density of the Safranin O stained sections was used to estimate the depth-wise PG content of the samples via the Beer-Lambert law [36]. The PG content was estimated from the different cartilage zones, SZ, MZ, and DZ, at depths corresponding to 0-10%, 10-30%, and 30-100% of cartilage thickness, from the articular surface to the tidemark. The average optical density in each zone was used in statistical and regression analyses.

Data analysis
As the collected data could potentially inherit noise or error due to a possible fault in sample preparation, apparatus function, or histological process, a series of outlier detection techniques were employed to detect and remove potential faulty data prior to the development of regression models. More specifically, the univariate boxplot test [37] was utilized to detect outliers in the PG content in SZ, MZ, and DZ, in addition to articular cartilage instantaneous, equilibrium, and dynamic moduli. The multivariate techniques, elliptic envelope (ELEN, [38]), one-class support vector machines (OCSVM, [39]), isolation forests (ISOFOR, [40]), and local outlier factor (LOF, [41]) were incorporated to detect outliers in the µ a and µ s of articular cartilage. For these multivariate methods, an outlier percentage of 25% was assumed.
A boxplot is summarization of a population into minimum, lower quartile, median, upper quartile, and maximum. In this approach, the samples outside of a certain threshold are considered outliers. The threshold can be defined as follows: Where the step is interquartile range [37]. The ELEN method is based on the minimum covariance determinant technique and its objective is to find h observations (out of n) whose covariance matrix has the lowest determinant [38]. The OCSVM method is an extension of the support vector machines technique to the realm of unsupervised learning. Basically, the algorithm attempts to estimate a binary function g which is nonzero on the region of an unlabeled input space S where most of the samples lie and zero on its complement [39]. The LOF algorithm is sprouted from the process of knowledge discovery in databases and related to density-based clustering algorithms. LOF tries to extend the binary state of being outlier to a relative degree of isolation that one object has from its surrounding neighbors. In other words, the LOF method incorporates the notion of k-nearest neighbor to measure the local-density deviation of an observation from its k-nearest neighbors. It is shown that a normal observation has a similar local density to its neighbors whereas an outlier has much smaller local density to its surrounding neighbors [41].
On the contrary to the aforementioned methods, the ISOFOR technique isolates the outlier by explicitly finding anomaly instances of the dataset. The method achieves this by taking advantage of fewness and discordance of the outliers which make them more susceptible to isolation. More so, the method builds an ensemble of decision trees and those samples with the shortest average path length are considered outliers [40]. Once the outliers in the measured optical properties, PG content, and biomechanical properties of articular cartilage were detected and removed by using the aforementioned methods, the multivariate and machine learning methods were used to developed regression models for estimating the effectiveness of optical properties for predicting cartilage composition and functional integrity across the full NIR spectral range and the 1 st (650-950 nm), 2 nd (1100-1350 nm), 3 rd (1600-1870 nm), and 4 th (2100-2350 nm) NIR windows [42]. In this study, PLS, RF, and SVR algorithms were used for developing the regression models.
PLS is a widely used regression technique in multivariate analysis [43]; its mechanism is to decompose the matrix of predictors X n×p and reference variables Y n×q into matrix blocks T p×m and U q×m (the components where m, i.e., number of components, is a user-defined hyperparameter) determined by the eigenvectors of XTX p×p and YTY q×q , respectively. The PLS algorithm then performs a regression on T and U components [44]. PLS is designed to overcome the limitations of linear methods for dealing with highly correlated predictors or where the number of predictors (p) is significantly higher than the number of observations (n) [43]. RF algorithm is a supervised learning procedure based on the divide-and-conquer principle [45]. It samples fractions of the observed data, ensembles M (user-defined) randomized decision trees to predict the reference variables with each fraction of the dataset, and then aggregates their predictions by both bagging [46] and boosting [47]. RF method is the most popular ensemble technique for regression due to its desirable features such as Variable importance measure, Out-of-bag error, and Proximities [48]. Additionally, it is recognized for its high performance on small datasets with high-dimensional feature spaces. SVR is a supervised learning algorithm based on separating sample datasets by decision boundaries or hyperplanes [49,50]. SVR incorporates kernel functions that map the dataset to a high-dimensional feature space where a maximal separating hyperplane is drawn on the dataset. Two parallel hyperplanes are then drawn on each side of the hyperplane that separates the dataset. The samples closest to the parallel hyperplanes are called the support vectors. According to the maximum margin problem, the larger the distance between the parallel hyperplanes, the better the separating hyperplane. The most common kernel functions used for nonlinear datasets include the Gaussian radial basis function (RBF), sigmoid function (tanh), and polynomials [49].
To avoid optimistic, inflated, and biased estimation, a mixed approach of cross-validation and blind test set was utilized to evaluate the performance of the regression models [51,52]. To this end, the osteochondral samples were randomly split into the cross-validation set (67%, n joint =10, n samples =38) and blind test set (33%, n joint =5, n samples =18). The sample splitting was performed such that all the samples extracted from a joint exclusively belonged either to the cross-validation or blind test set. The cross-validation set was used for developing, testing, and finding the optimal hyperparameters as the selection of the optimal hyperparameters for model development is a critical step when utilizing regressors such as PLS, RF, and SVR that incorporate several hyperparameters. More so, the GridSearchCV algorithm [53], an exhaustive fit-and-score method based on the k-fold cross-validation scheme [54] was used in the present study. The GridSearchCV algorithm split the cross-validation set into 5 consecutive folds (a 5-fold cross-validation approach), with each fold used once as the test set and the remaining 4 folds as the training set. Then, the algorithm scored the performance of the regression model based on the mean squared error and R 2 metrics for the given hyperparameter. The procedure is repeated over a user-defined grid of hyperparameters to determine the optimal hyperparameters that maximize the performance of the PLS, RF, and SVR regression models.
There are potential risks of overfitting the regression models when only a cross-validation scheme is employed. To avoid this situation, the blind test set was used to evaluate how well the optimal regression models would perform on an unseen dataset. Moreover, the Shapiro-Wilk normality test and the Pearson's and Spearman's correlation tests [55] were incorporated to verify whether the regression models could successfully predict the reference variables (of the blind test set). Particularly, the Shapiro-Wilk normality test was utilized to verify the normal distribution of the measured and predicted reference variables. The null hypothesis of this test is that the variable is normally distributed (p-value > 0.05). Furthermore, if the measured and predicted reference variables were both normally distributed, the Pearson's correlation test was then employed to evaluate the correlation between these values, otherwise the Spearman's correlation test was utilized to verify the correlation between these values. The null hypothesis of these both methods assumed that the measured and predicted values are not correlated (p-value > 0.05). Moreover, the effectiveness of optical properties for estimating cartilage composition and functional integrity was assessed using the coefficient of determination (R 2 ), the Pearson' (or the Spearman's) correlation coefficient (ρ), and the root mean squared error normalized by the range of the reference variables (nRMSE).

Optical characteristics of articular cartilage
The optical properties of articular cartilage in the full NIR range, the 1 st , 2 nd , 3 rd , and 4 th NIR windows are presented in Fig. 1(A). It is worth noting that the sharp increase observed in the optical properties of articular cartilage over the spectral range of 1900-2350 nm [ Fig. 1(A)] is due to presence of water absorption band. Furthermore, the penetration depth of photons traversing the tissue in the mentioned spectral ranges are shown in Fig. 1(B). Fig. 1. A) µ a , and µ s are the absorption and the reduced scattering coefficients of articular cartilage, and W1, W2, W3, and W4 are the 1 st , 2 nd , 3 rd , and 4 th NIR windows, respectively. B) The penetration depth of photons in articular cartilage in the NIR spectral range and its windows.

Articular cartilage biomechanical properties
The experimental values obtained from the stress relaxation and sinusoid loading measurements are shown in Fig. 2(A). Furthermore, the corresponding biomechanical response of articular cartilage is presented in Fig. 2(B).

Articular cartilage PG content
The light microscopy images of Safranin O stained sections, Fig. 3(A), obtained from the histological sections show variation in the PG content from the articular surface to the cartilagesubchondral bone interface. The PG content of articular cartilage at SZ, MZ, and DZ, with the corresponding observations, is illustrated in Fig. 3(B).  Superficial zone (SZ), middle zone (MZ), and deep zone (DZ) are defined from the surface to 10%, 10% to 30%, and 30% to 100% of cartilage depth. B) The optical density (i.e., PG content) of articular cartilage at SZ, MZ, and DZ, respectively. Furthermore, it is worth noting that the boxplot outlier detection found one and two outliers in the PG content of MZ and DZ, respectively.

Data analysis
In the present study, it has been shown that both µ a and µ s over the various windows of the NIR spectral range were found to contain crucial information about the PG content and biomechanical properties of articular cartilage. Moreover, when they were combined with multivariate and machine learning regression and outlier detection techniques, they could reliably estimate the composition and functional integrity of articular cartilage. In particular, µ a over the 4 th NIR window yielded the best estimation of the PG content in SZ (with 65% correlation) when combined with the SVR regression model (kernel = sigmoid, C = 10 3 , γ = 5 × 10 −4 ) and the OCSVM outlier detection technique. Furthermore, µ a over the 3 rd NIR window could optimally estimate the equilibrium modulus of articular cartilage (with 75% correlation) by incorporating the SVR regression model (kernel = rbf, C = 10 3 , γ = 5 × 10 −3 ) and the ISOFOR outlier detection technique. On the other hand, µ s over the full NIR spectral range was found to be able to optimally estimate the PG content in MZ (with 82.67% correlation) when the SVR regression model (kernel = sigmoid, C = 10 2 , γ = 2.5 × 10 −4 ) and the ELEN outlier detection technique are employed while, this parameter over the 2 nd NIR window could also provide the optimal Fig. 4. The scatter plot of the measured and predicted reference variables based on the combined optimal regression and outlier detection models. A) Optical density of the PG content at SZ based on µ a over the 4 th NIR window combined by SVR and OCSVM. B) Optical density of the PG in MZ based on µ s over the full NIR spectral range combined with SVR and ELEN. C) Optical density of the PG in DZ based on µ s over the 2 nd NIR window combined by RF and ISOFOR. D) Instantaneous modulus (Inst.) based on µ s over the 2 nd NIR window combined by PLS and ELEN. E) Equilibrium modulus (Eq.) based on µ a over the 3 rd NIR window combined with SVR and ISOFOR. F) Dynamic modulus (Dyn.) at 1.0 Hz based on µ s over the 2 nd NIR window combined by PLS and ELEN. Furthermore, ρ is the correlation coefficient between the predicted and measured value. estimation of the PG content in DZ (with 76.72% correlation) by utilizing the RF regression model (trees # = 10, max depth = none, minimum samples split = 16, minimum samples leaf = 1, bootstrap = true) and ISOFOR outlier detection technique. More so, µ s over the 2 nd NIR window showed significant capacity for predicting the instantaneous and dynamic moduli of articular cartilage (with 90.91% and 91.67% correlations) when the PLS regression model (components # = 3 and 2, respectively) and the ELEN outlier detection technique were utilized. Furthermore, the details of successful regression models which enables estimation of articular cartilage composition and functional integrity from its optical properties are highlighted in Table 1. Moreover, Fig. 4 presents the scatter plots of the measured and predicted composition and functional integrity values. Additionally, Table 2 briefly summarizes the impact of outlier detection techniques on the capacity of the optical properties to estimate the composition and functional integrity of articular cartilage.

Discussion
In this study, we assessed the potential of µ a and µ s in the NIR spectral range for estimating articular cartilage depth-wise PG content and functional integrity using multivariate and machine learning regression and outlier detection techniques, including PLS, SVR, and RF regression models, in addition to boxplot, ELEN, OCSVM, LOF, and ISOFOR outlier detection methods. Although optical properties have been applied for tissue characterization and diagnostic purposes [20] and are capable of differentiating between weight-bearing and non-weight-bearing cartilage in the 300-850 nm wavelength range [56], no study has evaluated their capacity for estimating cartilage composition or functional integrity. In the present study, we developed regression models for estimating cartilage PG content (in the SZ, MZ, and DZ) and biomechanical properties from its optical properties in the NIR spectral range. As the integrity of cartilage is directly related to its matrix constituents, and its structure and composition vary in a depth-wise manner, the approach of estimating the cartilage PG content in the different tissue depths allows detailed characterization with potential clinical application.
As light absorption and scattering in tissues are influenced by the different tissue constituents [15], their capacity to estimate the concentration of these constituents will depend on the relation between the specific optical property and tissue constituent. The wavelength-dependent µ a , combined with SVR, PLS, and RF, could reliably estimate cartilage PG content (Table 1). Specifically, models based on SVR, in combination with LOF and OCSVM methods, were capable of estimating cartilage PG content in all three zones (SZ, MZ, and DZ) from µ a over the full NIR spectral range and the 4 th NIR window. PLS, in combination with ELEN and OCSVM, was only capable of estimating PG content in the DZ using µ a over the 2 nd and 3 rd NIR windows. Similarly, RF was only able to estimate cartilage PG content in the MZ using µ a over the 2 nd and 3 rd NIR windows once the boxplot and LOF techniques were also utilized. Furthermore, the most effective estimation of PG content in SZ was observed when µ a over the 4 th NIR window was used in combination with SVR and OCSVM methods. The relationship between cartilage depth-wise PG content and its µ a , obtained using SVR, PLS, and RF models, suggests that PG macromolecules are strong absorbers of light in the NIR spectral range, and thus, strong contributors to the absorption profile of light traversing articular cartilage matrix. The capacity of µ a over several NIR windows for estimating the biomechanical properties of articular cartilage is likely due to an indirect relationship between µ a and cartilage functional properties via its PG content, which is known to govern cartilage equilibrium modulus [57]. More so, the capacity of µ a for estimating the PG content and biomechanical properties of articular cartilage over the various NIR spectral ranges confirms the hypothesis that PG content across cartilage ECM has a direct impact on light absorption and penetration (0.3-2.4 mm) into the tissue in these spectral ranges.
Earlier preliminary results [58] suggested that µ s combined with PLS was capable of predicting the depth-wise PG content of articular cartilage when the regression models are developed and cross-validated on a same dataset. However, new evidence from the present study suggests that µ s is not capable of estimating the PG content in SZ when the mixed approach of cross-validation and blind test set are incorporated. Regardless, µ s when combined with PLS, SVR and RF has the capacity to estimate the PG content in MZ and DZ over the various NIR spectral ranges. Moreover, µ s over the full NIR spectral range provided the best estimation of the PG contents in MZ in combination with SVR and ELEN. While, µ s over the 2 nd NIR window yielded the best estimation of PG content in DZ when RF and ISOFOR machine learning techniques were incorporated. Based on the fundamental structural relationship between articular cartilage collagen and PGs, where the PG macromolecules are entrapped within the collagen meshwork, it is likely that the relationship between µ s and the PG content is an indirect relationship via the confining collagen network, which is arguably the primary light scattering agent in articular cartilage.
Additionally, the µ s of articular cartilage, combined with PLS and SVR, could reliably estimate the biomechanical properties of the tissue. SVR, combined with µ s over several NIR spectral ranges and various outlier detection techniques, was an effective estimator of articular cartilage biomechanical properties. Also, PLS in combination with ELEN was optimal regression model for estimating the instantaneous and dynamic moduli from the µ s over the 2 nd NIR window. The relationship between µ s and articular cartilage biomechanical properties is likely due to an indirect relationship with its collagen network. Collagen is the main fibrillar structure in articular cartilage responsible for its structural framework and plays a crucial role in its biomechanical function. More so, given its amount (22% of ECM's wet weight), fibril size, and variable depth-wise orientation, collagen is arguably the major light scattering component of articular cartilage. Although cartilage cells (chondrocytes) may contribute to its light-scattering profile, this is likely to be minimal due to the small cell population (2%) of the ECM.
The performance and accuracy of the regression models (Table 1) suggest that SVR is the optimal regression algorithm for estimating the composition and biomechanical properties of articular cartilage, given its performance in estimating the depth-wise PG content and equilibrium modulus of articular cartilage. However, PLS could provide a reliable substitute approach. The superiority of SVR is possibly due to its capacity for dealing with a relatively small number of samples with high sparsity of the observation matrix [59] and handling non-linear relationship between predictor and target. It is worth noting that one of the limitations of the present study is its small sample size. Thus, although the obtained correlations between the measured and predicted reference variables by the regression models are relatively high, a larger dataset could provide more reliable and generalized outcomes with minimum bias.
In this study, bovine samples were used to estimate the effectiveness of articular cartilage optical properties for predicting the composition and functional integrity of the tissue. Thus, these findings would require to be further validated using human samples. Additionally, articular cartilage is composed of an intricate collagen-PG network; however, only the role of PG content in light absorption and scattering phenomena was investigated. Therefore, further investigation is required to elucidate the role of collagen fibers on the absorption and scattering of light traversing the tissue matrix.

Conclusion
In conclusion, the optical properties of articular cartilage, when combined with machine learning techniques, particularly SVR, are effective estimators of its composition and functional integrity. This approach could enable estimation of the depth-wise structure and composition of articular cartilage matrix, and has potential for detailed characterization of its tissue, with possible clinical applications.