Machine Learning Prediction of Collagen Fiber Orientation and Proteoglycan Content From Multiparametric Quantitative MRI in Articular Cartilage

Machine learning models trained with multiparametric quantitative MRIs (qMRIs) have the potential to provide valuable information about the structural composition of articular cartilage.

challenging due to lack of specificity of the parameter to any single matrix component. [3][4][5] Moreover, substantial overlap of the parameter values between normal and degraded cartilage makes it even more challenging to categorize degrees of degradation. [6][7][8] Articular cartilage is composed of specialized cells called chondrocytes and extracellular matrix (ECM), which includes water, ions, collagen, proteoglycans (PGs), and noncollagenous proteins. 9,10 Chondrocytes synthesize and maintain the matrix components. Collagen is organized into a network of fibrils, which varies in orientation along the cartilage depth and provides cartilage its tensile strength, whereas PGs are large aggregates of glycosaminoglycans that are responsible for the compressive stiffness. The interaction of these macromolecules with the matrix fluid and ions forms the physical structure and functional properties of articular cartilage. Disorganization of collagen fiber network and PG content loss are known indicators of OA. 11 Improved specificity of the qMRI parameters to a particular matrix component is key for accurate tissue characterization.
Previous studies have reported a more accurate classification of degenerated versus normal cartilage using multivariate analysis as compared to the conventional univariate analysis. [12][13][14][15] Lin et al have observed that combinations of qMRI parameters with support vector machine (SVM) algorithms exhibited favorable classification properties to categorize normal and degraded cartilage and stronger correlation with tissue biochemical components compared with individual MRI parameters. 16 Most of the studies applying advanced machine learning methods to musculoskeletal imaging have focused on classification, segmentation, and pattern recognition. [17][18][19] However, few studies have been focused on computational models and machine learning to determine cartilage compositional features. [20][21][22] Alterations in T2 maps have been found to be associated with changes in proteoglycan (PG), collagen content, and collagen orientation qualified by a phenomenological constitutive model. 20 Thüring et al investigated the correlation between distinct compositional and structural tissue properties, characterized by a discretized numerical model, and a set of qMRI parameters including T 1 , T 1ρ, and T 2 * maps. 22 Linka et al demonstrated that multiparametric qMRI combined with artificial neural network (ANN) regressor can be used to specify PG and collagen content accurately. 21 However, accurate noninvasive characterization of cartilage compositional and structural features based on clinically applicable medical imaging remains challenging.
The emerging machine learning paradigm can play an important role in addressing cartilage structural and compositional modeling. The present study aims to explore the accuracy and feasibility of different machine learning models, including random forest (RF), support vector regression (SVR), gradient boosting (GB), multilayer perceptron (MLP), and Gaussian process regression (GPR), using multiple qMRI parameters in prediction of PG content and collagen orientation, determined by quantitative histology and polarized light microscopy (PLM), respectively.

Sample Preparation
The subsequently described animal operation was approved by the Ethics Committee of Utrecht University for Animal Experiments in compliance with the Institutional Guidelines on the Use of Laboratory Animals. 14,23,24 Quantitative data for osteochondral specimens (N = 20) were obtained from an open dataset (http://doi.org/10. 5281/zenodo.3893218) using an animal model of posttraumatic osteoarthritis. 25 Briefly, tissue samples were gathered from the medial femoral ridge of the bilateral stifle joints of 10 Shetland ponies, two tissue samples from left and right stifle joints (mean age AE standard deviation = 9.3 AE 3.8 years). In seven ponies, cartilage lesions were surgically induced in the medial femoral ridges and then treated using different experimental repair procedures. The remaining three animals were neither operated nor treated and were used as controls. The animals were killed after 12 months, and triangular shaped samples were collected. The samples from operated animals included part of the lesions and surrounding intact cartilage tissue. 14,23,24 MRI A 9.4 T vertical bore small animal MRI scanner (Oxford instruments Plc, Witney, UK) was used with a 19-mm diameter quadrature volume RF transceiver (Rapid Biomedical GmbH, Rimpar, Germany). Tissue samples were positioned in the scanner such that the main magnetic field was approximately perpendicular to the normal of the bone-cartilage interface to minimize the magic angle effect. A single slice was selected, as detailed in the literature, 25 covering a predefined marked regions on each specimen. The MRI protocol consisted of a single-slice fast spin echo (FSE) coupled with six different magnetization preparations including T 1 with six TIs between 0.2 and 3seconds, T 2 with seven TEs between 8.7 and 80 msec, continuouswave T 1ρ (CWT 1ρ ) with seven spin lock times (TSLs) between 0 and 128 msec, 26 adiabatic T 1ρ (AdT 1ρ ), AdT 2ρ, 27 and relaxation along a fictitious field (T RAFF ) consisted of five different pulse trains incremented from 0 to 16 pulses with pulse duration 9 msec. 28 The following imaging parameters were used for the FSE readout: TR = 5 seconds (7 seconds for T 1 ), echo train length (ETL) = 6 (8 for T 1 and CWT 1ρ ), effective echo time (TE eff ) = 4.2 msec, slice thickness = 1 mm, matrix size = 192 Â 192, FOV = 19.2 Â 19.2 mm 2 , resolution = 0.1 Â 0.1 Â 1 mm 3 . CWT 1ρ measurements were repeated for nine different spin-lock frequencies (100, 200, 300, 400, 500, 600, 800, 1000, and 2000 Hz).
Relaxation time maps (Fig. 1) were reconstructed from signal intensities using Aedes software (http://aedes.uef.fi/). 23 Four 1-mm wide regions of interest (ROI1-ROI4) were defined to cover the entire cartilage region regarding a spectrum of biomechanical properties, from normal to severely degenerated cartilage. The regions were in visually intact cartilage at increasing distances from the lesion, with ROI1 the closest and ROI4 the most remote regions to the lesion. The ROIs were closely matched with predefined marked anatomical locations (for histology measurement) as defined in the previous study 25 (Fig. 1). Depth-wise profiles of the qMRI parameters were obtained for each ROI. For this purpose, pixel values were first averaged along the width of the ROIs, interpolated along the depth perpendicular to bone-cartilage interface, and then normalized across the cartilage thickness to eliminate differences in cartilage thickness between subjects and profiles obtained from quantitative histology measurements. 25,29 Quantitative Histology From Digital Densitometry and PLM Cartilage compositional proteoglycan content (PG content) and structural features (collagen fiber orientation) were determined by quantitative histology from digital densitometry (DD) and PLM, respectively. These measurements were used as ground truth references in the regression models. The histology measurements were carefully matched to the MRI slices and described comprehensively in earlier studies of these specimens. 25,29 In brief, the samples were formalin-fixed, decalcified, and sectioned to include the indentation test locations and the sites of the MRI defined ROIs. 25,29 The staining of sections was done using Safranin-O to measure PG content via optical density. 24,30 DD analysis was carried out using a light microscope (Nikon Microphot-FXA, Nikon Co., Tokyo, Japan) equipped with a monochromatic light source and 12-bit CCD camera (ORCA-ER, Hamamatsu Photonics, K.K., Hamamatsu, Japan). 14 Unstained sections were prepared for PLM measurement, which quantifies collagen fiber angles in the tissues. 31 An Abrio PLM imaging system (Cri, Inc., Woburn, Ma, USA), mounted on a conventional light microscope (Nikon Diaphot TMD, Nikon Inc., Shinagawa, Tokyo, Japan) was used to determine the orientation angle of collagen fibrils in the samples (0 = parallel to cartilage surface; 90 = perpendicular to cartilage surface). The average depth-wise profiles perpendicular to the cartilage surface were determined from three histological slices for PLM and DD and then normalized across the cartilage depth to match the qMRI profiles.

Machine Learning Models, Validation, and Comparison
For each of the 14 qMRI parameters, DD (PG content) and PLM (collagen fiber orientation), our dataset contained 1600 depth-wise data points collected from four cartilage full-thickness ROIs (each ROI with 20 depth-wise data points) across 20 samples. Then, five commonly used regression models were developed: 1) random forest (RF) that is an algorithm with ensemble learning method for regression; 2) support vector regression (SVR) that is used to predict discrete values according to a predefined error margin; 3) gradient boosting (GB) that is a variant of ensemble methods where multiple models are generated and combined to get a better performance; 4) multilayer perceptron (MLP) that is a fully connected class of feedforward artificial neural network; and 5) Gaussian process regression (GPR) that is a probabilistic machine learning framework based on Bayesian approach, it makes predictions by incorporating prior knowledge and provides uncertainty estimations. All models and hyperparameters were based on Python 3.5 in the SciPy 1.7.3 environment using the scikit-learn library. The models considered all the data points within the ROIs and they were used to predict the DD (PG content) and PLM (collagen fiber orientation) once trained with the qMRI parameters. The data points obtained from the depth-wise profiles of the qMRI measurements were fed into a 10-fold nested cross-validation, which was developed to include inner and outer loops (Fig. 1). Train and validation subsets were randomly selected from data points regardless of ROI's location or sample type. In each iteration of the outer loop, the train and validation subsets (90% of dataset, 1440 data points pooled randomly) were forwarded to the inner loop, which is a normal 10-fold cross-validation with a grid search. The hyperparameters of the machine learning regression algorithms that control the learning process were optimized in the inner loop based on the model's performance. The model with best parameters (Table 1), regrading performance metrics, was returned to the outer loop to be evaluated using the unseen portion of the dataset, the test subset (10% of dataset, 160 data points selected randomly).
To analyze the performance of the models, three performance metrics, R 2 (goodness-of-fit), root mean squared error (RMSE) and mean absolute percentage error (MAPE), were gathered from each iteration of the outer loop. An example is given for the GPR model to clarify the procedure. Train and validation subsets, 1440 data points that are randomly selected, are fed to the inner loop. It is a 10-fold cross-validation that is used to find GPR's best hyperparameters based on its performance metrics. Then, a new GPR model with the selected hyperparameters is built and tested with the remaining 160 data points in the outer loop, which is another 10-fold cross-validation. The GPR models are built from scratch and test subsets are selected randomly by splitting dataset with nonoverlapping data points in each iteration of the outer loop. 32 The average values of the performance metrics, gathered from the outer loop, were used to compare models. The values are presented in last row of Tables 2 and 3 for DD (PG content) and PLM (collagen fiber orientation) respectively. Then, the experiments were repeated using the model with the best performance (highest accuracy and lowest error) and two subsets of qMRI parameters selected by three MRI physicists (V.C., with more than 10 years of experience; M.J.N, with more than 15 years of experience; and M.T.N, with more than 20 years of experience) based on clinical translatability and low specific absorption rate: F 1 (T 1 , T 2 , CWT 1ρ [200, 400, 600 Hz]) and F 2 (F 1 plus AdT 1ρ and T RAFF ).

Statistical Analysis
Normality was evaluated with Shapiro-Wilk test throughout this study, and Spearman's Rho test was used to measure the strength of association between predicted and measured values. A P value of 0.05 was considered as the limit of statistical significance. All computations were carried out using Matlab Deep Learning Toolbox (Mathworks Inc. Natick, Massachusetts, USA). Tables 2 and 3, prediction performances were better with GPR, MLP, GB, and RF. GPR produced superior performance over the other models in all metrics, with R 2 of 0.75 and 0.66 for structural (PLM) and compositional (DD) features, respectively. MLP, GB, and RF showed almost identical performance (eg R 2 was between 0.68 and 0.70 for PLM and between 0.62 and 0.65 for DD). Predictions based on SVR were less accurate compared to other models, particularly in DD prediction (R 2 = 0.44, RMSE = 3.25, and MAPE = 130%). A relative overestimation can be seen clearly in ROI1, more pronounced in SVR-based predictions of the compositional feature, DD (Figs. 2 and 3). Spearman's Rho correlation coefficients of measured and predicted cartilage features were significant for   The machine learning models are random forest (RF), support vector regression (SVR), gradient boosting (GB), multilayer perceptron (MLP) and Gaussian process regression (GPR). The average values (AVGs) of the performance metrics, gathered from the outer loop, are presented in last row of the table. The machine learning models are random forest (RF), support vector regression (SVR), gradient boosting (GB), multilayer perceptron (MLP) and Gaussian process regression (GPR). The average values (AVGs) of the performance metrics, gathered from the outer loop, are presented in last row of the table.

As shown in
all ROIs. The stronger correlations were found with GPR (Rho = 0.77-0.88 for PLM, Rho = 0.71-0.83 for DD) as compared to the other models (Table 4). All correlations were statistically significant; however, lower correlation coefficients were observed in ROI1 compared to the other ROIs with all models. Our results showed stronger correlations and lower  (Tables 2 and 3, Fig. 4). The two subsets selected by the MRI physicists were F 1 (T 1 , T 2 , CWT 1ρ [200, 400, 600 Hz]) and F 2 (T 1 , T 2 , CWT 1ρ [200, 400, 600 Hz], AdT 1ρ , T RAFF ) subsets. We observed an improvement in GPR performance when trained and tested with the selected subsets of qMRIs: the F 2 subset resulted with R 2 values of (0.78 for PLM and 0.69 for DD) and RMSE (1.28 for PLM and 2.44 for DD). The F 1 subset did not improve results and the GPR prediction performances were unsatisfactory (R 2 = 0.64 and 0.56 for PLM and DD, respectively; RMSE = 1.63 and 2.94 for PLM and DD, respectively, Tables 5 and 6).

Discussion
In this study, we investigated the performance of different machine learning models to determine cartilage matrix components from qMRI multiparametric data. The predicted profiles of PG and collagen fiber angles were closely similar to those obtained from actual DD and PLM measurements. The most accurate predictions were obtained using GPR regression, which outperformed all the other implemented models with stronger correlations between predicted and measured cartilage features observed in the regions that were not adjacent to lesions. Another important finding was the improvement observed in GPR when trained only with T 1 , T 2 , CWT 1ρ (200, 400, 600 Hz), AdT 1ρ , and T RAFF .
Despite growing interest in machine learning applications in the medical imaging, only a few studies have investigated their use for characterization of cartilage matrix components. [20][21][22] Linka et al 20 were one of the first to study the association between qMRI techniques and human articular cartilage tissue features using computational models, where a constitutive model was implemented to correlate the compositional and structural intratissue changes under mechanical loading to the corresponding changes in T 2 map. Similarly, Thüring et al 22 utilized a discretized computational modeling of cartilage capturing fluid fraction, PG and collagen content, and collagen fiber orientation and showing that T 1 and T 2 * were significantly correlated with the modeled compositional parameters. In another study, 33 a finiteelement framework was developed based on anisotropic constitutive model of cartilage. It was trained and tested with qMRIs and optimized against Mankin score as an indicator of histological tissue degeneration. The model was able to capture the biomechanical properties with an exceptional accuracy R 2 = 0.966. In accordance with previous studies, our finding confirms that qMRI can be used for characterization of cartilage matrix components. Linka et al 21 reported a strong association between ANN-based predictions and measured PG contents (Rho ¼ 0:80). Consistent with the early reports on multiparameteric MRI, 21,22 we have better precision with the developed models by combining multiple qMRI parameters. Moreover, we observed similar correlation coefficients using MLP regression in multiple regions, which can be explained by the fact that MLP is a class of feedforward ANN. Therefore, our findings are consistent with the previous results showing a close correspondence between predictions and measurements in both cartilage compositional and structural features.
In this study, multiparametric qMRI was utilized to noninvasively estimate (with acceptable accuracy ranges) the most important histological biomarkers (PG content and   collagen fiber orientation). This allows the evaluation of subtle tissue changes not only of structure and morphology but also of microstructural composition, which is otherwise only attainable with invasive histological examinations. Moreover, the findings of this work indicate that the number of qMRI parameters can be reduced and still the models will predict the histological biomarkers with the same level of accuracy, improving scan efficiency. In addition to the routinely used T 1 and T 2 parameters in clinical imaging, rotating-frame relaxation time parameters (AdT 1ρ , T RAFF , and CWT 1ρ at low spin-lock field) were found to increase the prediction accuracy due to their inherent sensitivity to slow molecular motion. 29 In CWT 1ρ, a radiofrequency (RF) pulse with a constant spin-lock frequency is applied either on-or off-resonance, whereas in AdT 1ρ, the spin-lock frequency is modulated overtime, creating a wide range of effective frequencies which extend the sensitivity of the adiabatic RF pulse to many frequencies of molecular fluctuations. 27 These rotatingframe parameters not only carry complementary information but also allow imaging with substantial reduction in specific absorption rate (SAR), making them suitable in the clinical arena. 28,29,34 On the other hand, AdT 2ρ and CWT 1ρ at high spin-lock fields (≥600 Hz), which are intrinsically challenging to translate in clinical setting due to their high-power deposition and SAR limitations 34 were found to have no impact on the accuracy level.
The findings in this study suggest that the machine learning regression models were more successful in capturing variance within structural (PLM) than compositional feature (DD). The criteria for selecting these models were 1) to explore wide range of learning algorithms and 2) to evaluate the approximation of nonlinearity in predictions. Also, a 10-fold nested crossvalidation was implemented to provide 1) a better generalization of the performance considering all the data points, 2) an unbiased evaluation with minimized overfitting, and 3) a prediction of all 1600 data points. The GPR regression demonstrated a better performance in all metrics and stronger correlations comparing to the other regressions. GPR is a nonparametric Bayesian approach with kernel functions, which was optimized in inner loop. This provides an accurate probabilistic model to predict new observed data points. 35 However, our predictions were less accurate in the ROI adjacent (ROI1) to the lesions, where weaker correlation coefficients were found in comparison to the other regions. One possible explanation is that qMRI data were measured from two types of specimens (specimens with surgical lesions and healthy control specimens without lesions). This difference particularly affects predictions in ROI1, as it is the closest region to the lesion with significant difference between degenerated and intact cartilage. 25 Leave-one-out cross-validation has already been used for testing the performance of the machine learning algorithms. 21 Linka et al trained the developed algorithms in his work with all samples except one which was used for testing. 21 This procedure iterated over all samples, with 89 AE 23 (mean AE SD) pixels per sample on average: each sample was left out once to test the model's performance. 21 It is very likely that leaveone-out cross-validation leads to underfitting in the current study due to insufficient data points per sample. After preprocessing as described in the section Materials and Methods (averaging, interpolation, and normalization), our dataset included 1600 data points gathered from 20 samples: with 80 data points per sample across four ROIs. To tackle this probable issue, a 10-fold nested cross-validation with inner and outer loops was developed to estimate the error of the models and find the best hyperparameters using data points, as detailed in the section Materials and Methods. The models were evaluated using the hold out test subset in the outer loop to reduce the probability of overfitting and prevent information leakage. 32 Feature selection techniques, such as principal component analysis, can be another source of concern since the regression models are not able to produce accurate predictions when trained with limited features, two or three qMRIs, and insufficient data points per sample. Thus, two subsets of qMRIs were selected, as detailed in the section Materials and Methods, by three MRI physicists to repeat the machine learning experience.

Limitations
We implemented a nested cross-validation for model selection and evaluation, based on a pointwise approach toward the dataset. However, these results need to be interpreted with caution. Spatial association and pixel resolution were faded away due to averaging ROIs along the width and interpolation along the depth. This can be a possible source of error that introduces overfitting. Given sample-specific training, our models were not able to capture the relationships between the input and output variables accurately, as we had limited number of data points coming from few analyzed regions, a small sample size and no separate samples for testing. Finally, intersample variability in sample position and measurement configuration were further limitations.

Conclusion
The findings of our study suggest that multiparametric qMRI data in combination with regression models can be used to determine cartilage compositional and structural features noninvasively with high accuracy. Among the five applied models, GPR algorithm showed the best prediction performances. Good agreement was observed between quantitative histology measurements and predictions from cartilage qMRI parameters, with better estimates obtained for collagen fiber orientation than for proteoglycan content.