Associations of the Oxford Knee Score and knee arthroplasty revision at long‐term follow‐up

Self‐reported outcome measures are increasingly being collected for healthcare evaluation therefore it is prudent to understand their associations with patient outcomes. Our aims were to investigate: (1) if Oxford Knee Score (OKS) is associated with impending revision at long‐term (5 and 10 years) follow‐up, and (2) if decreased OKS at subsequent follow‐ups is associated with higher risk of revision.


Introduction
Patient-reported outcome measures (PROMs) capture patients' perceptions of their health, function and clinical treatment. 1,2 While initially designed as measures for clinical trials, 3 there has been a recent push for PROM collection at a higher level to inform risk prediction, costeffectiveness, and aftercare of patients. 2,4,5 Successful establishment of national PROM programmes, such as the Australian Orthopaedic Association National Joint Registry (AONJRR) PROMs programme, 6 the National PROMs programme in England, 7 and the incentivized Comprehensive Care for Joint Replacement model in USA, 8 indicate that PROMs will become increasingly accessible and utilized in healthcare evaluation. It is therefore prudent to better understand associations between PROMs and clinical outcomes.
Knee arthroplasty is an effective long-term treatment for patients with end-stage osteoarthritis (OA), however approximately 9% of total knee arthroplasty (TKA) and 20% of unicompartmental knee arthroplasty (UKA) patients will ultimately undergo revision arthroplasty in their lifetime. 9,10 The Oxford Knee Score (OKS) is a 12-question PROM that captures pain and function. It has widespread use because of its simple administration, good validity and consistency, and high patient compliance even when administered remotely. [11][12][13][14][15][16] The OKS at 6 months post-surgery is associated with impending revision (within 2 years following the score), 17 however longer-term associations have not yet been reported. This is of interest for a number of reasons. Firstly, cumulative frequency of revisions continue to rise after 6 months, and even past the twoyear mark. 18,19 Second, reasons for late revisions can differ from early revisions. 19,20 Early revisions (within the first 2 years) are more commonly associated with infections, pain and bearing dislocations (UKA), whereas later revisions (5+ years) are more commonly associated with aseptic loosening (TKA) and OA progression (UKA). Finally, patients with lower pre-operative scores tend to also report lower early (up to 2 years) post-operative scores, 21,22 therefore long-term longitudinal associations of individual patient scores may be informative.
The aims of this study were therefore to investigate: (1) if OKS is associated with impending revision at long-term (5 and 10 years) follow-up and (2) if a decrease in OKS at subsequent follow-ups is associated with higher risk of revision.

Patients
The New Zealand Joint Registry (NZJR) has collected data on all primary and revision joint arthroplasties since 1999, with a greater than 95% capture rate. 23 Until 2002, OKS questionnaires were sent to all registered knee arthroplasty patients at 6 months post-surgery, and then at five-year intervals, with 75% response. 24 Questionnaires continue to be sent to all UKA patients. However, due to the logistical burden of a larger number of TKA patients, from 2002 questionnaires were sent to a random selection of TKA patients to achieve 20% sampling from this group. The NZJR has national ethical approval. Patients undergoing joint arthroplasty provide consent to collection and use of their data.
All primary TKAs and UKAs performed between 1 January 1999 and 31 December 2019 with at least one OKS response at 6 months, 5 years or 10 years were included in this study (Fig. 1). Revision was defined as the addition, exchange or removal of one or more prosthetic components. Revisions were included if they were undertaken in the two-year period following the score at each timepoint. Patients who did not reach two-year follow-up following the score, or those who were deceased within the two-year period without undergoing revision were excluded from the analysis. At 6 months, there were 27 708 TKAs (1.2% revised within 2 years of the score) and 8415 UKAs (2.5% revised; Table 1). At 5 years, there were 11 519 TKAs (0.7% revised) and 3365 UKAs (1.4% revised). At 10 years, there were 6311 TKAs (1.1% revised) and 1744 UKAs (2.7% revised).

Patient reported outcome measures
OKS was scored from 0 to 48 (worst to best outcome), 14 categorized according to the Kalairajah classification (<27 'poor'; 27-33 'fair'; 34-41 'good'; >41 'excellent') at each timepoint, 25 and compared with the frequency of revision within the two-year period following the score. The frequency of revision within 2 years following the score for patients who reported a decrease in OKS of seven or more points from the previous follow-up was compared with all others, that is, those who reported similar (within seven points increase/decrease of the score) or increased OKS. The sevenpoint score change was based on estimates of the minimal important change (MIC) of the score. 26 Statistical analysis SPSS Statistics v26 (IBM corp., Armonk, NY) and PRISM 8 (GraphPad, San Diego, CA) were used to perform statistical analyses. At each timepoint, logistic regression, with OKS as a continuous variable, and Kaplan-Meier analysis were performed to assess associations between the OKS (6 months, 5 years and 10 years) and risk of revision within 2 years following the score. Spearman's correlation coefficient (r) was used to test associations between the OKS at each timepoint. To calculate useful clinical estimates, receiver operating characteristic (ROC) analysis was used to determine sensitivity and specificity when OKS cut-offs were applied according to the Kalairajah classification. 27 A p-value below 0.05 was considered significant.
Using the Kalairajah classification, if all patients who achieved 'poor' OKS (<27) were followed-up with further assessment, this would achieve good specificity (>90%) but poor sensitivity ($40%; Table 2). Around 10% of all patients scored below this threshold. In contrast, if all patients who achieved less than 'excellent' OKS (<41) were followed-up, moderate specificity (50%-60%) would be achieved with good sensitivity (60%-80%). Around 40% of patients scored below this threshold. The area under the curve (AUC) for the ROC curves ranged between 0.73 and 0.78.

Discussion
The recent push for PROMs collection at national levels will allow for wider application of these measures in healthcare settings. More research is needed to better understand associations of PROMs and clinical outcomes. This study is the first to report that the OKS had a strong negative association with risk of impending TKA and UKA revision up to long-term (10+ years) follow-up. For an individual patient, the OKS at 6 months exhibited moderate correlation with OKS at both 5 years and 10 years for both TKA and UKA (R > 0.5, P < 0.01), suggesting that early scores were also indicative of later outcomes. Additionally, a decrease in OKS of seven or more points when compared with the previous follow-up was associated with higher risk of revision within 2 years of the score. Few  Change score (6mo-5y) Change score (5y-10y) Fig. 3. Percentage of revisions within 2 years following the score for patients who reported a decrease in Oxford Knee Score of seven or more points compared with previous follow-up versus those who did not, at 5 years and 10 years, for (a) total knee arthroplasty (TKA) and (b) unicompartmental knee arthroplasty (UKA) patients. mo, months; y, years.
studies to date offer similar comparisons, however some have reported that patient outcomes immediately following surgery may be indicative of patient recovery in the short term. For example, patients with slow pain recovery in the first 8 weeks of surgery were more likely to report persistent pain at 6 months follow-up. 28 Similarly, patients achieving less than 90 o flexion at 8 weeks could be identified by having more limited flexion and slower range of motion recovery from day one. 29 While these studies are limited to early recovery, they provide support for an association between early outcomes and overall recovery trajectory.
Recently it has been proposed that PROMs may also have utility for patient aftercare decision-making. 4,5 Patients undergoing knee arthroplasty are recommended to have periodic long-term surveillance for prompt identification of patients who may be at risk of revision, [30][31][32] however this is unsustainable with increasing clinic burden. 33 We therefore considered the utility of the OKS as a screening tool for surveillance of knee arthroplasty patients. The study findings suggest that patients who report poor early scores, or those who report a seven-point decrease in scores at subsequent follow-ups have higher risk of revision and could be targeted for increased follow-up frequency. To provide an indication of patient volume with OKS as a screening tool, a cut-off of 27 points would allow for high specificity (>90%) but poor sensitivity ($40%), and around 10% of patients would be indicated for further follow-up. In contrast, a cut-off of 41 points would allow for moderate specificity (50%-60%) but high sensitivity (60%-80%), and around 40% of patients would be indicated for further follow-up. There is a trade-off between sensitivity and specificity with screening tools, 34 and optimal cut-offs would depend on institution-specific requirements. For example, practices that maintain routine clinic follow-up for all patients could elect to maintain high sensitivity to maximize capture of at-risk patients. In contrast, practices that have abandoned long-term clinic follow-up could maintain high specificity to avoid recalling large volumes of patients who would otherwise not be followed up according to standard practice. It is also important to note that while our findings suggest that OKS has some utility for remote patient surveillance, the moderate levels of specificity achieved with high sensitivity cut-offs suggest that further discrimination can be achieved if the score was combined with other assessments, such as other PROMs or radiographs. 35 This study had several limitations. We did not have access to pre-operative scores so could not perform analyses based on change scores. However, the strengths of this dataset include its large size, long term follow up, and high capture rate of revision surgery across a national population 23 ; and the analyses based on the post-operative OKS showed good predictive ability of revision and moderate discrimination between those who required revision and those who did not. Another limitation was that the study was based on a subsample of patients because of the NZJR sampling strategy and some non-responders. However, we found similar revision rates for those who completed an OKS questionnaire compared with those who did not (TKA: 6 months, 1.2% versus 1.5%; 5 years, 0.7% versus 0.7%; 10 years, 1.1% versus 0.8%; UKA: 6 months, 2.5% versus 3.1%; 5 years, 1.4% versus 2.1%; 10 years, 2.7% versus 3.1%), which did not suggest selection or response bias.
In conclusion, poor OKS and a decrease in score of seven points of more are associated with impending risk of knee arthroplasty revision from early to long-term follow-up. The associations of the score with revision suggest that the OKS has some utility as a screening tool for patient surveillance, and the OKS thresholds with varying sensitivity and specificity presented here provide some useful estimates that can help inform clinical practice.