Macroscopic ICRS Poorly Correlates with O’Driscoll Histological Cartilage Repair Assessment in a Goat Model Clinical Research on Foot & Ankle

Background: The purpose of this research was to evaluate whether the macroscopic assessment of repair cartilage quality of talar osteochondral defects in a goat model using the ICRS score is in correspondence with histological assessment using the O’Driscoll histology score. Methods: 32 caprine samples with six mm osteochondral defects treated with microfracture were analyzed six months postoperatively using high-resolution digital images. Two observers independently scored the defects using the ICRS (0-12 points). Histological analysis was performed by one expert histologist using the O’Driscoll Score (0-24 points) on 5 µm slices stained with Masson Goldner and Safranin O. Total ICRS and O’Driscoll scores as well as sub items were compared using a Spearman correlation coefficient (p<0.05). Results: The median ICRS for Observer 1 and 2 were 6.5 (range: 4-11) and 6.5 (range: 3-11). The median O’Driscoll score was 11.5 (range: 3-20). The correlation of the total ICRS scores and the O’Driscoll score was not significant, nor was the correlation of sub items (p>0.05). Conclusion: This animal study suggests that isolated macroscopic ICRS assessment of cartilage repair tissue does not correlate well with histological assessment. Possible explanations may be limitations of surface assessment compared to analysis deeper into the tissue and the necessity of more elaborate macroscopic assessment including hypertrophy, colour, lesion size, location and degenerative status of the joint. Techniques that are more accurate, precise and reliable, such as histology, dGEMERIC and T2 mapping MRI, contrast enhanced CT or optical coherence tomography (OCT), should be considered as alternatives or at least as complimentary methods.


Introduction
Histological evaluation of cartilage repair tissue of after treatment of osteochondral defects is a longstanding and proven method for quality assessment with both qualitative and quantitative parameters [1]. Experimental animal studies often use histological analysis of the repair tissue after sacrifice of the animals [1][2][3][4][5][6][7][8][9]. In the case of clinical studies, a biopsy can be taken for histology [10][11][12]. Even though considered to be the gold standard, biopsies are usually not performed in a clinical setting because it destroys part of the repair tissue. In addition, the histological processing and subsequent analyses take time. This makes surgical intervention in 1 session with the assessment of the quality of repair based on the biopsy results not possible, necessitating a two-step procedure. Moreover, biopsies can only show the characteristics of that particular part of the lesion of which the biopsy was taken. Locational differences within the repair area are not detected. Furthermore, the quality of biopsy and the moment in follow-up of the acquisition affect the quality and result of the biopsy [13]. Therefore, histology of in vivo repaired tissue is generally restricted to a research setting, while in clinical practice, cartilage quality is assessed through imaging and intraoperative macroscopic evaluation by the surgeon [14].
Macroscopic assessment of the repair tissue during (second look) arthroscopy is used to assess the degree of defect fill, the aspect of repair tissue as well as its integration with adjacent cartilage after treatment [15]. It has shown to be associated with the clinical failure rate [16]. There are 2 validated grading systems in the literature that assess repair tissue quality during arthroscopy or open surgery, the Oswestry Arthroscopy Score (OAS) [17] and the International Cartilage Repair Society Cartilage Repair Assessment System (ICRS) [18]. Main components of both scores are the nature of the tissue (macroscopic appearance of the cartilage surface) and whether the repair tissue is satisfactory (the extent to which the original defect is filled with repair tissue and the integration of the repair tissue into the border zone). Both scores are used in the evaluation of human as well as animal repair tissue [5,19,20]. Other studies have reported satisfactory interobserver reliability and repeatability for both the ICRS and the OAS arthroscopic score with an ICC>0.7 and good correlation (Pearson's correlation coefficient, r=0.88; P<0.001). Cronbach's alpha was slightly better for the ICRS: 0.91 vs. 0.82 for the OAS [17,21].
There is no data that compares the macroscopic scores and findings from the histological quality assessment scores. Therefore, it is unknown to which extent these macroscopic scores correlate with the histological reference standard. A good correlation would strengthen diagnosis and evaluation based on arthroscopic assessment instead of histology, whereas a poor correlation would indicate that the score is insufficient for objective cartilage repair tissue assessment. The aim of this study was to evaluate the correlation of the macroscopic ICRS score to a histological score of repaired cartilage. The hypothesis was that the ICRS corresponds moderately with histological analysis, because the ICRS evaluates the surface of osteochondral lesions whereas histology also assesses deeper tissue layers.

Materials and Methods
Materials: 32 caprine samples of treated osteochondral defects treated were analysed 6 months postoperatively. The samples were retrieved from a study investigating the healing response of artificially created talar osteochondral defects of 16 goats treated with microfracture [22]. The study protocol was approved by the local Animal Welfare Committee (protocol number ORCA102287). A 6 mm diameter osteochondral defect was drilled in the tali of both hind legs using a posterolateral surgical approach. In the same surgical session, the goats received microfracture treatment using microfracture awls. The animals were allowed to directly bear weight postoperatively. The goats were sacrificed after a follow-up of 24 weeks, after which the tali were extracted and photographed in multiple directions using a highresolution digital camera (Panasonic Lumix, Kadoma, Japan).
The tali were cut into 20 mm × 20 mm blocks around the defect over the entire depth of the talus [22]. The samples were embedded in Methylmetacrylate and 5 µm sections were cut at approximately a quarter and at the centre of each defect. Haematoxylin and Eosin (general staining), Safranin-O (GAG content) and Masson Goldner (Collagen) staining were performed on multiple sections for each location. Representative slices of both locations were selected for cartilage quality assessment.

Degree of defect repair
In level with surrounding cartilage 16 14 75% repair of defects depth 14 14 50% repair of defects depth 2 3 25% repair of defects depth 0 1 0% repair of defects depth 0 0

ICRS Grade
Grade I (normal) 0 0 Grade II (nearly normal) 10 15 Grade III (abnormal) 22 16 Grade IV (severely abnormal) 0 1 Table 1: Summary of the ICRS score. Sub item scores, total score and grading according to the ICRS are given.
Macroscopic scoring: The photographs of the samples were collected, blinded and randomized using a computer program. 2 orthopaedic surgeons skilled in ankle arthroscopy (GK,MK) scored the high-resolution photographs individually using the ICRS score. The ICRS score consists of 3 items: degree of defect repair, integration of the border zone and macroscopic appearance that are each scored 0 to  Table 1). The total score (ranging from 0-12) is categorized in 1 of 4 grades (normal, nearly normal, abnormal and severely abnormal) [15].
Histologic scoring: Thirty histological sections were available per sample. All were scanned for quality of processing and staining. Selected histological sections were scored by 1 expert histologist (RvN) using the O'Driscoll histology score [23]. A recent review provides an overview of the existing histological scoring systems [9]. Both the O'Driscoll score [23] and the Pineda score [24] met our requirements of applicability for the assessment of cartilage repair and available validation for animal studies [25]. The O'Driscoll score was used because of its extensiveness. The score by O'Driscoll contains 4 main categories and sub items, which gives a total score of a maximum of 24 points [23].

Statistical analysis
To test our hypothesis, the validity of the ICRS was determined by calculating the interobserver variability and the correlation between the O'Driscoll score and the ICRS. A sample size of 32 had 80% power to detect a minimum level of correlation of 0.4 (moderate correlation based on the definition of Landis and Koch [26]) with a 0.05 two-sided significance. A correlation above 0.4 was considered to be of possible significance, since both Landis c.s. and Fleiss c.s. define a correlation below 0.4 as only poor to fair [26,27]. For the ICRS to be a reliable tool for cartilage repair assessment a less than moderate correlation coefficient is not desired. Due to skewed distributions and outliers, non-parametric Spearman's correlation coefficients were calculated between the total O'Driscoll and ICRS scores, as well as between specific subsets. The subsets were chosen selected on the basis of the related themes of the scored items: Macroscopic appearance (ICRS) and Surface regularity (O'Driscoll score); and Integration to border zone (IRCS) and Bonding to the adjacent cartilage (O'Driscoll score). A p<0.05 was considered significant.

ICRS:
The median ICRS score was 6.5 (range 4-11) for Observer 1 and 6.5 (range (3-11) for Observer 2 ( Table 1). The defects were classified mainly as a grade II (nearly normal, n=10 and n=15 for Observer 1 and 2, respectively), or a grade III (abnormal, n=22 and n=16 for Observer 1 and 2, respectively). Only Observer 2 classified one defect as grave IV (severely abnormal). The inter-observer agreement was 50% with a к of 0.4 (p<0.001) for the total ICRS score. The agreement for the sub items was higher.
Histology: The median O'Driscoll score of the samples was 11.5 (range 3-20, Table 2). All defects were predominantly filled with fibrous tissue, with diminished Safranin O staining and with large collagen structures visible using polarized light microscopy. The variety in surface appearance and structural integrity was substantial (Table 2 and Figure 1).    Correlations: The Spearman's rank correlation coefficient of the average total ICRS score for either observers or the O'Driscoll score was not significant (Observer 1: ρ=-0.004, p=0.98, 95% CI=0.40-0.37, Observer 2: ρ=-0.132, p=0.47, 95% CI=0.53-0.29, Figure 2). Likewise, the correlations were not statistically significant for the specific subsets of macroscopic appearance (ICRS) and surface regularity (O'Driscoll   Table 3: Inter observer agreement of the 2 observers of the ICRS score and correlations between the O'Driscoll score and the IRCS.

Discussion
The aim of this study was to assess the correlation between the macroscopic cartilage repair tissue assessment using the ICRS score and the histological assessment using the O'Driscoll score after microfracture treatment of talar osteochondral defects in a goat model. Our hypothesis was that the scores would correlate moderately, however, no significant correlation was found between both scores.
Strengths of this study are the use of a validated goat model, 2 surgeons with extensive clinical experience with arthroscopic cartilage repair, as well as an expert histologist familiar with cartilage repair histology. However, the number of samples was limited and the average quality of the fibrous repair tissue did not cover the full range of the O'Driscoll score (range 3-20 vs. 0-24). These 2 factors may at least in part have contributed the absence of a correlation. Also, the macroscopic scoring was not performed in vivo, but by means of post mortem photographs. Although this is a method that is frequently used in a variety of studies for macroscopic scoring of cartilage [21,28,29], it does not allow the freedom of assessment from all angles as in vivo arthroscopy does. Lastly, the study was designed to detect a correlation larger than 0.4, because a correlation smaller than this was considered to be clinically irrelevant. A more subtle correlation could be present between the scores or sub items.
Apart from limitations in the study design, several characteristics of the ICRS score may also have affected the lack of correlation. Firstly, only extreme structural disorganizations such as large clefts can be registered by the macroscopic score ( Figure 1A), whereas subtle structural differences, such as smaller fissures or cysts hidden from the macroscopic surface ( Figure 1B), are not detected. This leads to overestimation of the quality of repair tissue by macroscopic assessment compared to histological measures ( Figure 2).
Secondly, the lack of correlation could be explained by the introduction of individual judgement in the ICRS score, because both the degree of defect fill and measurement of the demarcating border require the observer to determine quantitative values based on individual estimations. It could be possible that repeated individual scoring results in a different score. Previous articles did not specify the degree of experience with the ICRS scoring of the observers, but one article did show a significant increase in inter observer agreement after 2 months training [30]. Whether this also influences the correlation to histology remains to be investigated.
Thirdly, the ICRS does not allow the observer to report graft hypertrophy, nor does it include the colour aspect of the defect. Both items have been discussed in literature to be of relative importance and are included in the Oswestry Arthroscopy scale. However, we were not able to detect a colour difference as used by the OAS (pearly hyalinelike, white fibrous tissue, yellow bone) between our samples with a good or a poor O'Driscoll score, since all were more or less white fibrous tissue.
Moreover, the O'Driscoll score assesses the borders in one plane and is location dependent, while the ICRS score takes the entire defect rim into consideration and judges the percentage of the rim that is attached. This explanation is supported by the fact that no correlation was found between the sub item scores for the integration of the repair tissue into the border zone.
The results indicate that despite the satisfactory inter observer reliability found previously, ICRS scoring may not be an accurate manner to determine cartilage or repair quality during arthroscopy. Arthroscopy also allows for assessment of multiple domains such as the size of the lesion and the general state of degeneration of the joint. Since these are all features that influence the healing or possible deterioration of cartilage defects, these items could be added in the score. to make the ICRS scoring more accurate [27,[31][32][33].
For research purposes, alternatives of histology that are more accurate, precise and reliable should be considered, such as dGEMERIC and T2 mapping MRI, contrast enhanced CT or optical coherence tomography (OCT) [34]. These techniques are also applied more and more in clinical practice and according to the increasing amount of literature of these advanced techniques image parameters correlate highly with cartilage quality [35][36][37][38].
In conclusion, this animal study suggests that isolated macroscopic assessment of the quality of cartilage repair tissue of talar osteochondral defects treated with micro fracture using the ICRS score has a poor correlation with histological analysis. Possible explanations may be found in the limitations of surface assessment compared to analysis deeper into the tissue and the necessity of more elaborate macroscopic assessment including hypertrophy, colour, lesion size, location and degenerative status of the joint.